Can ChatGPT be a useful tool for scientific research? – Part 1
Today I start a series attempting to test the reliability of ChatGPT – an artificial intelligence chatbot of Open AI – in sensitive areas, especially in international relations. In this series, I will tackle both the political and military sides of international relations. I will start with some simple questions and check the facts. Later, I will make the questions more complex, going from the description of facts to more analytical and value-loaded issues.
I put up some simple chess questions to check the facts. But I will also ask some first “why questions” that are generally more difficult to answer.
“Conversation “between Jürgen Dieringer and ChatGPT as of 24.04.2023. I am using the free version of ChatGPT, which works with a dataset that runs until September 2021.
Who is world chess champion?
As of my knowledge cutoff date of September 2021, the current World Chess Champion is Magnus Carlsen from Norway. He has held the title since 2013 and has successfully defended it in several matches since then. However, there may have been changes since then that I am not aware of.
Fact-check: All correct. The system has no reference to the ongoing chess world championship, as that was scheduled after the timeframe the system uses.
Who was the highest-rated chess player in history?
As of my knowledge cutoff date of September 2021, the highest-rated chess player in history, according to the FIDE rating system, was the Norwegian grandmaster Magnus Carlsen, who achieved a peak rating of 2882 in May 2014. However, it’s worth noting that the FIDE rating system has undergone some changes over time, so comparing ratings from different eras can be difficult.
Correct, and the reservation is absolutely needed here, as the measurement system is not entirely stable. So let’s go a bit deeper on that.
Who was the stronger player, Kasparov or Carlsen?
It’s difficult to directly compare the playing strength of Garry Kasparov and Magnus Carlsen since they played in different eras and faced different opponents. However, both Kasparov and Carlsen are considered to be among the greatest chess players of all time and have achieved many impressive victories and accomplishments in their careers.
That is a very diplomatic answer. It sounds as Chatty-AI went through one of my IR classes. 😊
Kasparov was the dominant player in the 1980s and 1990s, holding the world number one ranking for 255 months in a row, and was widely regarded as unbeatable for a long time. He won numerous prestigious tournaments and played many memorable matches against top players, including his famous matches against the IBM computer Deep Blue.
Carlsen, on the other hand, has been the world number one player since 2011 and is known for his exceptional endgame skills and versatility. He has won numerous major tournaments and has defended his World Chess Championship title several times.
Ultimately, it’s hard to say who is the stronger player as both have different styles of play and have achieved great things in their respective eras.
Facts are OK
Is the ELO system stable?
The Elo rating system, which is widely used in chess and other games to rate the relative skill levels of players, is generally considered to be stable and reliable. The system is based on a simple mathematical model that takes into account the ratings of the players involved in a match, as well as the outcome of the match, to determine how much each player’s rating should change.
One of the strengths of the Elo system is that it is based on a large sample of games played over time, which helps to reduce the impact of random fluctuations and gives a more accurate picture of a player’s skill level. In addition, the system is designed to be self-correcting, so that players who consistently perform above or below their rating will see their rating adjust accordingly over time.
However, like any system, the Elo rating system is not perfect and there are some limitations and potential sources of error. For example, the system assumes that players’ skill levels remain constant over time, which may not always be the case. In addition, the system can be affected by factors such as playing conditions, fatigue, and psychological factors that may influence a player’s performance.
Overall, while the Elo rating system is not perfect, it is generally considered to be a reliable and effective tool for rating player skill levels in a wide range of competitive games and sports.
I don’t know why he refers to limitations like “fatique” and “psychological factors”. Either you win a game, or you don’t, meaning that you are stronger than the other player, or not. Although, have you ever met a chess player who lost a game in perfect health or without the “circumstances” being against him? ☹
Anyway, let’s ask ChatGPT slightly differently:
Is the ELO system inflationary?
The Elo rating system is designed to be self-correcting, which means that it is not inherently inflationary. In other words, the system is designed to adjust for changes in the skill level of the player pool over time, and as a result, the average rating of players should remain roughly the same.
However, there are some factors that can contribute to inflation in the Elo system. For example, if the player pool is expanding and new players are entering the system with lower ratings, this can cause ratings to become inflated over time. Additionally, changes in the rules or playing conditions that favor certain types of players or styles of play can also lead to inflation.
To address these potential sources of inflation, the Elo system includes several mechanisms that are designed to keep ratings in check. For example, the system uses a “K-factor” to adjust the amount that a player’s rating changes based on the rating difference between the two players and the number of games played. This helps to prevent rapid changes in ratings that might otherwise contribute to inflation.
Overall, while the Elo rating system is not immune to inflation, it is designed to be self-correcting and includes mechanisms to prevent inflation from becoming a significant problem over time.
Pretty good answer. Let’s double-check:
Why are there more higher-rated chess players today than in the 1990s?
There are several reasons why there are more higher-rated chess players today than in the 1990s:
- Increased accessibility to training and resources: With the advent of the internet, chess resources, such as online training materials, chess software, and databases, have become much more accessible to players worldwide. This has made it easier for players to improve their skills and has increased the overall level of play.
- More tournaments and opportunities to play: There are now many more chess tournaments being organized around the world than in the 1990s, with a wider range of formats and levels of competition. This has created more opportunities for players to gain experience and increase their ratings.
- Improved rating systems: The Elo rating system used in chess has undergone some changes and improvements over the years, which has made it more accurate and reliable in rating players’ skill levels. This has led to a more accurate reflection of the relative strengths of players and has allowed for more fine-grained distinctions between players of similar strength.
- Greater interest and popularity in chess: Chess has become more popular and widely followed around the world, with more people playing the game and following top-level competitions. This has created a larger pool of talented players and has driven up the overall level of play.
Overall, these factors have combined to create a more favorable environment for chess players to improve their skills and achieve higher ratings than in the past. As a result, the overall level of play has increased, and there are more highly rated players than ever before.
Pretty good answer, again. Still, I am missing the reference to the “K-factor” ChatGPT had in the previous response. Let’s ask directly:
What is the K-factor in the ELO system?
The K-factor is a variable in the Elo rating system that determines the amount that a player’s rating changes after a game. The K-factor represents the maximum amount that a player’s rating can change in a single game, and it is typically higher for new or unestablished players and lower for more experienced or established players.
In the Elo system, the formula for calculating a player’s rating change after a game is as follows:
ΔR = K × (W – We)
Where ΔR is the player’s rating change, K is the K-factor, W is the result of the game (1 for a win, 0.5 for a draw, and 0 for a loss), and We is the expected result based on the players’ ratings.
The purpose of the K-factor is to balance the accuracy of the ratings with the need to allow for changes in player skill levels over time. A high K-factor allows for more rapid changes in ratings for new or unestablished players, while a lower K-factor allows for more gradual changes in ratings for more experienced or established players.
The K-factor is usually set by the governing organization or rating system and can vary depending on factors such as the level of competition, the number of games played, and the quality of the players involved. In general, the K-factor is higher for lower-rated players and lower for higher-rated players, with a typical range of values from 10 to 40.
OK, that looks accurate. In the relevant Wikipedia, I like the reference to Arpad Elo, the system’s founder, who is not mentioned in the ChatGPT answer. Wikipedia uses the expression “zero-sum-game,” which I like, as this alone does suggest that there is no inflation and the system should be stable.
Overall, the Open AI ChatGPT system provides quite reliable results in this relatively straightforward set of questions. Next time I will go a bit deeper and see what the AI will produce when we ask questions about the international order.