The ELO ranking system is an inadequate method of surveying one's skill level on DArt
The debate is finished. The distribution of the voting points and the winner are presented below.
After 1 vote and with 3 points ahead, the winner is...
- Publication date
- Last updated date
- Type
- Standard
- Number of rounds
- 3
- Time for argument
- Two days
- Max argument characters
- 10,000
- Voting period
- One week
- Point system
- Multiple criterions
- Voting system
- Open
No information
ELO is a score attached to all our profiles. It starts out at 1500 and adjusts in responses to our wins, losses, and ties. Under this system, it's a bigger deal to beat someone with a higher score than it is to beat someone with a lower score.It's derived from Chess rankings and was developed by someone named Elo. There's a whole Wikipedia page devoted to it.On DART, our rankings on the Leaderboard are determined by ELO score.
- Elo goes up when you win a debate.
- Elo goes down when you lose a debate.
- Elo is affected by winning/losing.
- Elo stays the same when the debate is on Unranked mode.
- This example is one of my unranked debate, and if I won the debate in ranked mode, in theory, I should be somewhere in 1450 instead of as low as I am now(1427)[1][2]
- This would show that if a person always debates on Unranked mode, no matter how good/bad he is at debating, he always stays at 1500 no matter what(until he accepts a ranked debate and wins/losses).
- People are, presumably, in different skill levels. There are legends like Ramshutu, Oromagi, and RM, then there are trolls on the bottom of the barrel like Ramdatt.
- So, if I am a very good debater, but I am inferior to those legends, yet I only pick ranked debates with those legends and lose most of the time, I will have a 1400 Elo even though my skill is at least top 30.
- People may have different Elo on different accounts, and different people with different skill levels may have the same Elo.
- Relatively on DArt, PinkFreud and LittleCookie is the same person, so they have a similar skill. However, PinkFreud is of 1614 Elo and LittleCookie 1500. [3][4]
- BlueCrystal and 3RU7AL both have a 1500 ranking, but BlueCrystal only offered minimalistic, single-line responses while 3RU7AL exhibited coherent logic, and the only vote of his consists of him winning the argument points.[5][6]
- Because these two people have different skill levels,
- And they have the same Elo,
- This would mean the Elo system is not an adequate method for surveying people's skill level on DArt, at least for this example.
- Envisage, an intelligent debater from DDO[7], came here and forfeited them all.[8]
- If a person chose to not try in debates, it is not the sign that this person is bad.
- One debate called "The Earth is Round" got a number of votes, but Con failed to respond at all, and Pro won without exhibiting skill in debating.[9]
- If a person won a debate without exhibiting much skill, it is not the sign that this person is good.
- Elo depends on how the people voted. If an intelligent and obvious debate got zero votes, although the "Virtually winning" side is more skilled than the "Virtually losing side", no one gets any judgment(And if both are users just fled from DDO, they will be regarded the same under Elo even though one is better than the other).
- The person with the most Elo has a tendency to debate more because more debates won = more points added. The skilled debaters who don't debate too much will not get the same number of points as the debaters on the same levels but debate more. Elo is effective at accounting how many debates are won here, and people may conclude more debates won = better, however these examples suggest, no.
- Debating is not chess. It can be very different. One who is good at Rap Battling may not be good at regular logic debating.
- I do not need to give a better example as I only need to prove why Elo is an inadequate system in this sense.
In this context, when we say skill, we refer to the skill of a debater on this site. So, what makes for a skilled debater? Of course, we all know a skilled debater when we see one. Similarly, we know a poor debater when we see one. Simply put, we all share an intuitive understanding of what makes a debater skilled or unskilled. The challenge my opponent and I face - and the challenge ELO faces - is meaningfully articulating this intangible sense.
In the realm of social science, operationalization is defined thus:
Operationalization is the process by which a researcher defines how a concept is measured, observed, or manipulated within a particular study. This process translates the theoretical, conceptual variable of interest into a set of specific operations or procedures that define the variable’s meaning in a specific study.
- Look at one example.
- This example mainly shows why Elo was not a good system for video games. However, the reasoning behind this can also be used for Debating.
- This would support my idea: If a former DDO veteran just transferred to this site(with, of course, 1500 Elo) and immediately won against a DArt veteran with 1600 Elo, would you say that the former is actually on 1500 Elo? No! The former is above 1500 Elo based on his true skills.
- The text states that: For starters, the Elo system was created to gauge player skill on a 1v1 basis. This is pretty simple to accomplish because the players have full control over their destiny. If Player A, rated at 1000 Elo, beats Player B, rated at 1400, there should be a substantial swing in both player’s ratings because Player B was clearly supposed to have no problems beating Player A. When there isn’t any random element involved in the game, like chess, this makes complete sense. Player B is expected to consistently play at a 1400 rating level. Seeing as how he lost to someone with only 1000 Elo, he’s clearly no longer playing at a 1400 rating.
- Assume said DDO veteran had a skill level equal to 1629 Elo translated to DArt notation, said DArt veteran should lose fewer points than he did because he is really debating someone with a higher level than what it would seem. Thus, because DDO veterans are winning debates with a 1500 Elo, they are gaining points much quicker than they should because what they really are is above 1500. Their Elo does not reflect their skill anymore.
- I extend all my arguments made above since my opponent never attempted to refute during this 2-day period. Everything you see here is built on top of the last argument.
- The more often you win, the more "skilled" you are, according to Elo.
- Oromagi is the top debater on the entire site because he debates very frequently and basically all the time he wins. The DArt base clearly thinks Bsh1 and Blamonkey are more skilled than Oromagi because the quality of the debate of these two debaters is better than Oromagi's average debates. Oro had admitted he had sniped noobs and trolls on the site because he wants to boost his performance. I admit that Blamonkey had also had debates with less experienced noobs and I-give-up forfeiters, but Blamonkey's quality is higher. Bsh1, on the other hand, was one of the most skilled people on DDO, and the reason he is not as high as we think he is is that he LEFT THE SITE. Oro had also admitted that other debaters are better than him and were pretty surprised that he won every single of them.
- CONCLUSION: THE TOP DEBATERS ARE NOT DEFINITELY THE MOST SKILLED. THEY ARE JUST THE MOST FREQUENT WINNERS.
- CONCLUSION: IF THE TOP DEBATER ISN'T THE MOST SKILLFUL, IT MEANS THIS SYSTEM SHOULD BE REVISED.
- Also, debates can range from full forfeit to a true clash of ideas. The "round earth" debate listed last round shouldn't boost Pro's ranking because he didn't exhibit any skill. unvoted and unranked debates should add something to both users' accounts based on skill, especially if you can see if one won or lost. If unranked and tied debates exist, it means both users' might not have an accurate ranking. A true skill-based ranking system should have AI determining who won and who lost instead of people voting on it because soon enough the winning ones will learn to appease most voters(Oromagi did it), instead of exhibiting the most potential. The goal is shifted.
-Quality arguments = a debater who puts forward logical, clear, and relevant arguments is demonstrating skill as a debater-Quality writing = a debater who can articulate their arguments with concise, understandable writing is demonstrating skill as a debater-Consistency = a debater who has a long track record of demonstrating the above attributes is further proving their skill as a debater (why? The longer someone’s impressive track record, the more confident we can be that their skill is durable and not just a “flash in the pan”).
I say “results that generally” match our intuition, because operationalization will never be perfect. By its nature, the process of operationalization loses some of the phenomenon’s qualitative richness in its pursuit of objective measurement. Even so, operationalization is valuable because it allows us to assess broad trends and support claims with objective data. If we refused to quantify anything, we’d be left in an ambiguous state where each person’s personal opinion carries no more weight than another’s. Therefore, operationalization, though imperfect, is a valuable tool for understanding reality in an objective, consistent manner.
Similarly, the test for DART’s ELO is whether the scores and rankings generally comport with our intuitive sense for what makes a good debater.
Based on my calculations, the top 10 debaters on this site have an average win ratio of 88.15%. Meanwhile, the bottom 10 debaters possess an average win ratio of 19.75%.A tougher quantitative test for ELO would be to examine the win rates of those just below the top 10 debaters. Predictably, the next group, 11-20, sports a slightly lower win rate, coming in at an average of 84.59%. The next group below them, 21-30, possesses a win rate of 72.51%.
And again, it’s not my job to argue that ELO is perfect in all cases. It’s not my job to demonstrate conclusively that the number 1 ranked debater is better than the number 2 ranked debater, and so on and so forth.
Look at the previous argument and sources are nowhere to found. Either way, my sources are real examples that support my claim and my opponent's sources are just elaborate definitions.
The DArt base clearly thinks Bsh1 and Blamonkey are more skilled than Oromagi because the quality of the debate of these two debaters is better than Oromagi's average debates.
Oro had admitted he had sniped noobs and trolls on the site because he wants to boost his performance.
CONCLUSION: IF THE TOP DEBATER ISN'T THE MOST SKILLFUL, IT MEANS THIS SYSTEM SHOULD BE REVISED.
So far, I have proved that Elo has no direct correlation with skill, which defeats its whole purpose.
Look at one example.
- Some debates are unranked, thus, any skill exhibited therein will not affect the participants' ELO scores
- Differences in opponent quality could result in a higher ELO debater actually being less skilled than a lower ELO debater
- Many noobs who start out with 1500 ELO may not deserve this score, for whatever reason
- Sometimes people forfeit. Sometimes debates don't get votes
- ELO is an operationalization of debater skill
- A good operationalization is one that produces results that generally comport with our intuitive understanding of the phenomenon in question
- ELO scores generally comport with our intuitive understanding of what makes for a good debater
- Therefore, ELO ought to be considered an adequate measurement of debater skill on DART
I am not claiming that ELO is an exact, perfect measurement of your quality as a debater. ELO is just one part of your overall “portfolio,” if you will.And again, it’s not my job to argue that ELO is perfect in all cases. It’s not my job to demonstrate conclusively that the number 1 ranked debater is better than the number 2 ranked debater, and so on and so forth.Rather, my job is to demonstrate that ELO is an adequate measurement (or, “operationalization”) of skill. I have argued that for a measurement to be adequate, its results must generally comport to our intuitive understanding of what makes for a skilled debater. By demonstrating the positive correlation between win rate and ELO, I believe I have done so.
- My opponent has identified instances of "system breakdown." That is to say, he has identified situations where ELO will fail to 100% accurately reflect skill.
- My argument can withstand these points.
- My opponent has not substantively addressed the key elements of my argument.
- Therefore, my claim that ELO is an adequate measurement of skill should be considered to stand (at least at this point in the debate).
Leaderboards are a visualization of achievement. The purpose of a leaderboard is to show people where they rank in a gamified system. The leaderboard shows them where they stand in relation to their peers.
If you know the person at the top has a score of 900 points, you'll know it's at least possible to increase your skill to reach that higher score.
A common use for a leaderboard could be to represent a sales team. Each item in the list would represent a sales person and their sales over a period of time.
- Leaderboards should be able to tell you who is best at a game.
- Leaderboards should tell you who has the best time in a game.
The purpose of a leaderboard is to show users where they rank in a gamified system. Those at the top enjoy the notoriety it brings; as for everyone else, the leaderboard shows them where they stand relative to their peers.
(Also, my opponent forgot to put the sources AGAIN)
Oh wait, there is no need for rebuttals because I went for my opponent's roots instead of its ends. Everything above suffices to respond to my opponent's points.
Leaderboards are to measure one's skill exactly, and Elo isn't doing that. It may hint towards higher skill but it is not directly exact.
- My sources say ELO should be a perfect measurement of skill
- ELO is not a perfect measurement of skill
The leaderboard on DArt, which runs on Elo, does not necessarily survey skills. Many debaters who are good aren't getting the ranking they deserved because they are just here once in a while(Blamonkey, etc), while someone equally as good or not even as good as him might get higher rankings than him(Oromagi, etc). Thus, in order to get on top of the leaderboard, one would focus on quantity(the number of debates won) instead of quality(skill).
- ELO is an operationalization of debater skill
- An adequate operationalization will produce results that generally comport to our intuitive understanding of the phenomenon in question
- ELO produces rankings that generally comport to our intuitive understanding of the debater skill
- Therefore, ELO ought to be considered an adequate measurement (or, "operationalization") of debater skill
In gist: ELO is a flawed system, which gives us a decent comparative estimate. As a surveying technique, it seems to do its job, ranking people more likely to win higher and vice versa. So flawed, a better system could be made, but adequate.
Operationalism is a word I haven't heard in too long. And very hard to say "qualitative and not quantitative" when numbers are applied making it a quantitative measurement even if a flawed one (if curious I can expand this in the comments). The comparison to win ratios did well in defending ELO.
"I would prefer having an AI telling me who won and who lost based on textual evidence." Damn, that brings me back memories of old arguments. My mind automatically goes to how that could be gamed more easily (plus I generally advice against mentioning spam island... in this case I'm genuinely curious, but without a source showing the system they claim to have, it doesn't carry the day as it may have been intended to).
"If Elo has no correlation with skill," I disliked this line a lot, as that exact correlation had already been shown.
Sources lean toward pro, but as exemplified with the gamerification article, con was able to leverage pro's own sources against him to keep this within the tied range.
The correlation was shown by arguments presented in the debate. I would also say ELO shows persistence, which could be part of where the error value would stem from if doing a statistical model of it. In almost any system, there will be outliers such as Virt who is great, but lacks the persistence.
You are of course encouraged to share any thoughts on refining the voting standards to have less votes struck down. There is currently a thread for it, which might generate some referendum questions: https://www.debateart.com/forum/topics/4310-what-would-your-ideal-voting-policy-look-like
That said, your proposed judgement system can easily be done with special rules and Judicial Decision (the judges being selected via agreement to just vote however pro and con indicate in the final round; to which I would be happy to assist via being such a judge).
Not with my case though. I have encountered countless veterans and that resulting in me not winning any in the last 2 weeks.
> "If Elo has no correlation with skill," I disliked this line a lot, as that exact correlation had already been shown.
ELO seems to be a good measure of persistence.
Since noob sniping is apparently not penalized and there seems to be a large number of forfeits from people who lose interest, anyone who grinds as many debates as possible and never forfeits will be rewarded for their effort.
Even highly skilled, highly ranked debaters, like Danielle on the old site seemed to overwhelm their opposition with a Gish Gallop of citations and had the favor of the moderators who were empowered to strike down any votes against them for "insufficient RFV".
Thanks for the mentions.
I was originally very excited to participate in ranked debates, but I quickly learned that no matter how "logical" and "objective" the voting guidelines were believed to be, the actual judges themselves are incapable of acknowledging their own bias blind spot.
I have proposed that all debates be "self-moderated", that is to say that only the two participants in each debate are allowed to vote.
This way, the goal of the debate is to ACTUALLY CONVINCE YOUR DEBATE PARTNER and not simply make them look silly in order to sway an audience.
It seems like such an insanely simple solution to what many consider "a virtually intractable problem".
Thank you for voting.
bump
Hey y'all,
You were all mentioned in this debate, so I'm tagging you in case you're interested in voting.
bump
bump
bump
Thank You.
Here's some feedback. I know you didn't ask for it lol, but I think it will help you.
Try to organize your argument better. Put barriers between your contentions, and between your prelude. Also, put links directly on numbered sources.
I knew that ELO was the ranking in our profiles, but I thought it was an acronym I didn't know the meaning of. No wonder I didn't know. I detest wiki. Reminds me of encyclopedia salesmen. Yes, there was once such a profession.
ELO is a score attached to all our profiles. It starts out at 1500 and adjusts in responses to our wins, losses, and ties. Under this system, it's a bigger deal to beat someone with a higher score than it is to beat someone with a lower score.
It's derived from Chess rankings and was developed by someone named Elo. There's a whole Wikipedia page devoted to it.
On DART, our rankings on the Leaderboard are determined by ELO score.
You might be interested.
Sorry. What is ELO as related to ranking?
Wow, unexpected. Got myself a big opponent here who won both of our debates.
I was too enthusiastic to wait any longer.
If you increase to at least one week for argument I *WILL* accept.
This must be true since I can easily identify many debaters with greater skill than me but lower ELO. ELO measures win/loss relative to ranking but in no way surveys skill.
If you switch the time for arguments to at least one week, I'd strongly consider taking it.
Thank you :)
Experience is not the same as skill.