Instigator / Pro
12
1476
rating
4
debates
25.0%
won
Topic
#667

Which voting moderation technique scales better, crowd sourcing or manual administration?

Status
Finished

The debate is finished. The distribution of the voting points and the winner are presented below.

Winner & statistics
Better arguments
6
0
Better sources
4
4
Better legibility
2
2
Better conduct
0
2

After 2 votes and with 4 points ahead, the winner is...

MrMaestro
Parameters
Publication date
Last updated date
Type
Standard
Number of rounds
4
Time for argument
Three days
Max argument characters
12,000
Voting period
One month
Point system
Multiple criterions
Voting system
Open
Contender / Con
8
1687
rating
555
debates
68.11%
won
Description

My belief is that the current voting moderation system - manual review - does not scale well. I recently proposed that crowd-sourced initiatives were the solution. RM has chosen to defend the current structure.

https://www.debateart.com/forum/topics/1563?page=1&post_number=22

Resolution: Crowd-sourced voting moderation features scale better than manual administrative voting moderation techniques.

--Pro will be forced to argue using crowd-sourced solutions alone, as agreed.
--Con will be forced to argue using manual, human-labor solutions alone, as agreed.

Definitions:
*Merit: Each idea is to be judged on its ability to handle increased traffic with regard to vote moderation capabilities.

*Crowd-sourced: In this context means computer algorithms that make use of feedback from the general Dart population as opposed to feedback from specific mods.

*Scale: Meaning the ability to handle a growing amount of work (votes needing moderation) in a capable, labor efficient manner.

Round 1
Pro
#1
Thank you for accepting my debate, RationalMadman. This should be interesting! I'm new here at DART. I was initially drawn to the clean UI, but I stuck around because of the high-quality content. Debateart understands that strict voting standards ensure high-quality debates. I think that's what sets this place apart from other websites.

I want to see this website grow because it's a fantastic place! My concern, however, is that the process of manually reviewing every single bad vote does not scale well. Sure, moderators could add more moderators, but this gets slowed down by politics (i.e. some mods don't want to water down their authority). Web traffic fluctuates wildly. This can mean periods of massive overload and massive downtime. Rigid systems, like the one currently in place, do not respond well to wildly fluctuating input

Automatic systems don't have this problem. I think this website could achieve high quality, automatic vote-moderation using the following four features:

Judge rankings. Build a feature that lets users "rate" a judgment. Someone could "downvote" a judgment while explaining how it didn't meet the voting standards, which would decrease a user's overall judgment rating. Conversely, users could also "upvote" a judgment, rating it as constructive and high quality, which increases "judge rating"; encouraging good votes.

Automatic vote removals. Using the vote system above, a judgment with too many downvotes would be automatically be deleted/reduced. 
 
Abuse prevention: A great side effect of this system is that it acts as a strong indicator of abuse. Trolls who consistently cast terrible votes will see their judgment score drop rapidly. A new account with repeated bad votes could be flagged or have it's voting privileges temporarily suspended, for example.
 
Bias/Bully detection: This feature involves tracking how often someone votes for/against another user. If they seem to be "targeting" someone than that could be an indication of bullying, for example.

As a novice developer, I wanted to pick features that I thought would be relatively simple to build. These features are realistic and would maintain high-quality votes while massively reducing workload. This is the power of using crowd-sourced data to automate complex problems. 






Con
#2
Optional Song to replay (right-click the video and click 'Loop') as background music: https://www.youtube.com/watch?v=NLHdxG1KHss

[#] = Source #

There are three versions of the 'crowd sourcing' that Pro is suggesting (actually four, but the fourth is what my side is and how I will win as manual moderation is an extension of the concept of crowd sourcing and this Kritik will near-destroy Pro's basis but won't be enough as I have to justify it over the other three types).

I don't have a source for this as the Wikipedia even splits everything up wrong (into fields of study or types of web-related elements to those fields/industries) but I am guaranteeing you that there are three fundamental forms of 'crowd sourcing' and a fourth type that is actually behind what manual administration ends up maintaining itself on but that is not the same, I agree to Pro on that, and will explain the difference promptly.

  1. The most popular wins, up-vote only scheme where up-votes cost nothing in terms of in-site currency or hopefully real-life too (meaning on top of the fee of membership, not inclusive of that). A prominent company who brought this into the mainstream Internet were Facebook (who realised removing the down-vote could not just inflate the up-votes as the only way to express dissatisfaction with one thing is to up-vote things that oppose it but realised there was psychological benefits and kept users pleased and hooked to the site even if many hated what they did so long as those people didn't get to comment abusively and couldn't down-vote)[1][2]
  2. The 'this vs that' mentality. Either works by:
       1.. Having freedom to up-vote and down-vote with no cost or limit. (YouTube is the most popular form of crowd sourcing and were to this system, what FB was to up-vote only system I don't have a source for this but it's a well-known fact and was even pre-Google owning them that they came up with it)
        2. Making what would be the first form (up-vote only) cost something in terms of in-site currency (could simply cost from a limited number of votes per day to the same person) (this is used on sites where the rating is meant to have most value and avoid 'friends voting for friends' outweighing actual quality)
        3. Making down-votes cost (often from the point-stack) but up-votes be free. (used in some lowkey forums where points are the only currency, such as CreateDebate.com - warning sometimes has rude content)
        4. Having downvotes only... Yeah this never has succeeded in being implemented as far as I know as no one enjoys it, not even the 'haters' as they'd rather upvote what they like more often than not.

   3. The third type of crowd-sourcing revolves around focusing much more on the percentage, or difference between the upvotes someone receives as well as downvotes and using this to even predict weight (and apply via AI at times) to future votes. I came across this on a rap battle website who seems to nearly have invented the concept. This doesn't just do what Pro would have done, it's not only retroactive in weight, it will give you 'credit' that only the admins get to see and your 'vote credit rating' lets your one vote be like 3*as valuable as a noob-voter and 5* as valuable as a low-ranked voter on their system. This crowd sourcing method requires manual administration anyway, as they have to 'approve' of the rating and make sure it's correctly picking up on the rating but basically the frequency with which people thumbs upped your vote as well as mods and admins enjoying the quality and depth of your voting and lack of any apparent bias would make you have massive weight relative to other voters (can become 4* the weight of a noob and 6-7* as much as the lowest voter if you're extremely high-rated but as none of it was public I don't know nor do I want to reveal the math I think they used as it's their intellectual property). The website that uses it may not actually want me to give them credit for it (intellectual property and secrecy)... So I won't name them and will just say I never was privy to the inner workings and what I say is theoretically based on speculation... They did seem to admit they weighted votes in this way to quite a few users though, so I'm not so sure it's wrong to share. I was impressed by it so much that it inspired this. An alternative to this is enabling trusted users to have massive weight automatically and having a somewhat manual administration combined with crowd-sourcing. This was seen by me on a website for Mafia that again may not want me naming them.

4. The secret type of crowd-sourcing, is based on proportional representation and completely Kritiks Pro's dichotomy at the roots:

The users of the website are the crowd-source towards the Mods or at the very least if they dislike the Moderation and it's a dictatorship on-site they still crowd-source the website via off-site rankings, comments on comparison sites but even these comparison sites use their own manual (well, automated but undemocratic) means of comparing and sometimes have no comments section at all and calculate popularity based on frequency of site-use in any publicly available manner to measure it. The most respected site of this type (which other sites that rte tend to give credit to the rating of in their own ratings) is Alexa.[3]


The concept of proportional representation is not entirely present in manual moderation as it works far more by appointment than popularity or direct voting but the reason it is a false dichotomy in this debate is that it revolves around avoiding mob-rules, alt-abusers (which is linked to bot-abuse if you can get away with that), many 'fools' voting down good quality reasons for deciding (RFD) and many voting up a non-offensive, upbeat but poor quality vote in reasoning.

The fact that Pro admits they may need to be incorporated and that it's blatantly the mods and humans (not AI or algorithms) that will need to constantly alter the weighting or ability to so easily vote on votes makes my case for me. It is blatant that those that site has agreed to be in charge (or hasn't run the site dry of user-base by going to a competitor that has better vote moderation) means even by the very principles of crowd sourcing, the site is correct in operating the way that it does. Do you want any Tom, Erica and Jack to vote on a debate's vote and have equal say? Debating is a fine art combined with science that takes severe finesse to judge. This isn't a game-show like X Factor where it should begin with some level of manual moderation and then the people decide beyond that, it is the very opposite. The people get to vote and the superior judges get to decide which of the votes is actually valid or not. Debating is not a popularity contest, it is a contest of reasoning well and politely enough.

Sources:
Round 2
Pro
#3
Fundamentally, we are dealing with the problem of scale. Every crowd-sourced website faces a version of this problem eventually, how do we enforce community standards and quality control without manually reviewing everything? A voting moderation system is just a quality-control machine. Our goal, as quality-control engineers,  is to maximize quality while minimizing work. This is always achieved through some form of automation.

My opponent has categorized "crowd-sourcing" into three/four arbitrary categories. I would argue that there are only two mechanisms that really matter - positive reinforcement and negative reinforcement. These reinforcement mechanisms form the environment in which users interact with each other. 

Positive Reinforcement - Upvoting, positive commenting, and increased ranking are mechanisms that encourage certain behavior.
Negative Reinforcement - Downvoting, negative commenting, decreased ranking, vote deletion/reduction, and vote-privilege suspension are negative reinforcement mechanisms that discourage certain behavior.

The ideologies that get enforced are dependent on group culture. Forums in academia and technology (StackOverflow for example) tend to focus on truth rather than popularity because that is what the community values. Other more casual online communities may hold different values and encourage different behavior. The website owner can modify this virtual environment anytime by adding or removing features, thus modifying group behavior. (Note that SO also has a voting system).

I propose that changing the virtual environment also changes group behavior. If we can make a user care about the quality of their judgments (say by decreasing their judgment ranking), then we can moderate their behavior. 

Quick example: DebateArt could separate the voting submission box into four separate text boxes for each voting point category. They could make textboxes required if points were awarded for that category. Additionally, they could add quick info popups on each category briefly explaining what is expected of a vote. Small subtle changes like this would massively influence voters to critically think about why they award points, and to read the Code of Conduct more thoroughly. This is an example of how we can alter the virtual environment in small ways to influence behavior.

-------------------------

So my opponent has touched on a lot of subjects here but hasn't taken a definitive stance on any of them. They haven't reinforced their position or made clear to me why manual administration is superior with regard to scale.


The false dichotomy assertion.
The concept of proportional representation is not entirely present in manual moderation as it works far more by appointment than popularity or direct voting but the reason it is a false dichotomy in this debate is that it revolves around avoiding mob-rules, alt-abusers (which is linked to bot-abuse if you can get away with that), many 'fools' voting down good quality reasons for deciding (RFD) and many voting up a non-offensive, upbeat but poor quality vote in reasoning.
There is no dichotomy. The word "dichotomy" implies polarization, "choose one or the other".That isn't the case here. There are a spectrum of solutions between manual administration and automation - some more effective than others, especially when it comes to scaling.

  • Alt-accounts can often be automatically detected with cookies by examining a users IP address and geographic location.
  • I'm not sure exactly what you mean by "Mob rules", however bullying and collusion are quite easily detectable when you know what to look for. This ties into what you were saying about percentages. 
  • Ignorant voters or "novices" get conditioned quickly using reinforcement mechanisms described above
The four categories of crowdsourcing?
1 - Upvotes only -  optimized to make users feel happy
2 - Upvotes/downvotes - optimized to produce engaging content through competition
3 - Percentages/Weighted voting - Very much specific to this website; not crowdsourcing in general, but otherwise a fascinating idea which I will examine shortly.
4 - This is an argument against democracy in general, this doesn't really belong on this list.

So the arguments separately are decent, but the way you've clumped them together doesn't make sense to me. I think the positive/negative reinforcement model is a more apt categorization.



Weighted voting

Weighted voting has been around for a long time. It's prominently used in board meetings where a shareholder's vote is proportional to the amount of company stock they own. Applying this concept to debates is an interesting idea. If the DebateArt community decides that some shareholders (users) are more valuable (create better content), than other users than this could be applied with great success.

The fact that Pro admits they may need to be incorporated and that it's blatantly the mods and humans (not AI or algorithms) that will need to constantly alter the weighting or ability to so easily vote on votes makes my case for me.
I think the "vote multiplier" could be automated pretty easily. Perhaps once the user meets certain requirements like achieving a good judgment score coupled with long-term account activity the multiplier would automatically increase. One of the side effects of weighted voting is that it would create new power dynamics among the debate community, which may or may not be a good thing.

I'm generally confused why you would bring this up, it seems counter to your position. So we both agree that the most pragmatic solutions probably lie somewhere in between the spectrum of automation <---> administration. The problem is that we can't have a debate if we agree with each other, which is why we agreed to adhere to the following rules:

--Con will be forced to argue using manual, human-labor solutions alone, as agreed.
--Pro will be forced to argue using crowd-sourced solutions alone, as agreed.

Weighted voting alternative
There are other ways to encourage people to be good voters. Weighted voting is an extremely direct approach, but there are more subtle ways to influence behavior. For example, a simple "badge" next to your name, a badge earned through dozens of well-judged debates and constructive feedback, might be significant enough to naturally encourage good voting practices. The badge could be awarded to anyone who breaks into the top 20% judgment rating, for example.

"Debating is a fine art combined with science that takes severe finesse to judge."
What metrics make a good judge?
What makes a "good judge"? It's not so easy to define really. Is a good judge one who is non-biased? One who gives constructive criticism? How do we measure "Judge fitness"? I'll leave that question open for now.

Perhaps I could take the judge-rating thing a step further. The single rating could be split into several subcategories, such as bias, constructiveness, and adherence to the code of conduct's strict voting rules. Perhaps a single number is not sufficient to measure something this complex.

Tracking bias
So it's pretty easy to track bullying and collusion. You simply need to track how frequently they vote for/against other users and compare that to the average. Bias isn't always predatorial though, sometimes bias is just inherent.

You could take this a step further and "tag" the debates to indicate category (using crowdsourced labor, similar to what Quora does). You can measure the way the user tends to vote in different categories to detect general biases like always voting Atheist/Theist in a religious debate or always voting left/right in political debates. This seems like a valuable metric for a debating website.
Con
#4
I am about to display to everyone on this website how to annihilate many points with a truly imperative understanding of the burden of proof, logic and the topic at hand.

Please watch and listen to this to get a vibe of my prowess at this moment in time:



I state this: The thing that good votes are based on is not how many people like them. This is precisely irrelevant and corrupt a system to based it on. A good vote is based upon good reasoning, analysis of the arguments, processing the logic and interactions of it deeply so on and so forth.

This means, the positive reinforcement is going to be more for people who make votes (whether good or not) that many like and punish those that makes votes (whether good or not) that many dislike.

Every other element of handling it that Pro is saying needs to be done (spotting bullying, ganging, toxic targetting whatever else) is all precisely conceding to Con that the ultimate authority must be mods and that everything about incorporating crowd sourcing is going to increase the complexity of the system, need many more updates to the website, cost more manpower in the mod team and tech team to spot cheaters etc and guess what? The entire system would have faulty basis.

The reason crowdsourcing works for something like Facebook is that the very condition, the exact polarity towards which 'good posts' are drawn to is ones that many like. That is genuinely the win-condition for 'best post' but this is not at all, even remotely the factor or thing to aim for in writing a good vote. The entire system fails to do what it sets out to do and on top of that since this is 'scaling' is going to require more and more server space, mod teams, tech updates and complex rule alterations to handle it.
Round 3
Pro
#5
I state this: The thing that good votes are based on is not how many people like them. This is precisely irrelevant and corrupt a system to based it on. A good vote is based upon good reasoning, analysis of the arguments, processing the logic and interactions of it deeply so on and so forth.

This means, the positive reinforcement is going to be more for people who make votes (whether good or not) that many like and punish those that makes votes (whether good or not) that many dislike.
I assert that the culture of a group determines how they up/downvote judgments. We, as a group, sculpt the culture of this website. If high-quality votes are socially expected, then that's what you'll get. The real goal is to build up an active userbase of good judges, such that the culture gets passed on naturally.

The ultimate authority must be mods and that everything about incorporating crowd sourcing is going to increase the complexity of the system,
Automation reduces the mods workload. Automatically removing bad votes, automatically flagging alt-accounts, creating metrics to determine good and bad voters, all of these things are designed to make the administrator's life easier. The alternative to this is just getting more random internet friends to help out and hope they don't quit. 

 incorporating crowd sourcing is going to increase the complexity of the system, need many more updates to the website, cost more manpower in the mod team and tech team to spot cheaters etc 
Building an up/downvote feature is something they teach in a high school web programming course. It's really not rocket science. Detecting two accounts with the same IP is trivially easy. Seriously, just type "what's my IP" into google. Automatically removing downvotes can be done with a single line of code. I knew you would go for this argument which is why I chose simple features.

----.

I'll point out that Con hasn't made a single argument defending their position - they've chosen only to attack mine. Why is manual administration the way to go?
Con
#6
The purpose of crowd-sourcing is to encourage popularity to be the punishment and/or reward system for a website where the quality of posts on it are determined by popularity. I have explained explicitly (and nowhere does Pro, so far, deny) that the actual basis of a good vote is 0% how many people liked it and 100% how much an entrusted member of staff with deep knowledge of the Code of Conduct regarding voting moderation and the trustworthiness to carry out such moderation, thinks it's an acceptable vote or not.

The only element of Pro's case so far that remotely 'poked a hole' was that there's a lack of reward system but I countered this, explaining if there ever should be a reward system it still should be from manual moderation and not crowd-sourcing as the popularity of the vote should not ever be the determination of the best vote. The flip-side is blatantly worse, as sufficient but unpopular votes (which could simply be made by unpopular users but apparently Pro is going to magically screen for grudge voting) are all going to be negated as 'too bad' or removed entirely if they get enough negative ratings.

This then, as I stated in all Rounds so far, means the ultimate authority is always going to have to be moderators manually moderating the crowd-sourcing system. Pro says the site will have the financial, manpower and raw expertise to be able to pull-off automated grudge-checking, alt-checking (deeper than just IP checks which can be spoofed:https://www.iplocation.net/ip-spoofing) and on top of that ensure that... The votes are based on the CoC... Or does Pro actually say that at all? 

Everything about Pro's system scales worse because it leaves the exact same moderators in charge of handling or masterminding the crowd-sourcing discipline since popularity is not the scaling on which a good vote should be based and even more so shouldn't have the power to remove votes when on the other end of the popularity-spectrum. The notion that most users care enough about the website and debating to vote well and honestly is firstly brought into speculation as the lowest ranked debaters are going to be having equal say as the highest ranked (despite objectively having displayed poor understanding of debate so far on the website) and a whole array of other things. To be very clear, the scaling judgement weight that Pro offers isn't for the ones upvoting, it's from the upvotes blindly ascertained as being correct. Popularity of a vote is not determination of a good vote, I don't get how explicit to make this...

Round 4
Pro
#7
Forfeited
Con
#8
That's what I thought.