51 How Things Go Wrong

In this chapter we explore how ethical boundaries can get crossed. The sections provide only a sampling and the case studies and examples prove that this is not an academic exercise. Follow the links in the examples for a deeper dive into the details of a case.

I recommend to study the examples at first read not with an assumption of malfeasance; people often engage for the right reasons and somewhere along the way things go off the rail. For example, an algorithm might not be objectionable in its own right, but applying it at scale to a certain demographic can be unethical. Modeling price elasticity is standard practice; applying such a model to data collected from organizations competing in the same market to raise prices above what a functioning healthy market would provide can earn you an antitrust law suit in federal court.

Also look out for opportunities for human intervention—the human in the loop, as it is called. Many algorithmic systems are developed to support decisioning, the ultimate call is supposed to be made by a human being who uses the algorithmic answer as another data point. The examples are full of situations where a human is built into the system and the algorithmic answer does not get sufficient scrutiny. Bad things happen when responsibility is abdicated because of resource constraints, ill will, or excessive trust in the analytics.

51.1 Cybersecurity

With respect to the consequences of missteps and unintended consequences, data systems are entering a phase that cybersecurity has gone through previously. Today, the cybersecurity posture of an organization is a top priority. CISOs, Chief Information Security Officers, are common among high-ranking executives, whereas Chief Data Officers and Chief Analytic Officers tend to be found in Director-level positions. That is changing as the strategic importance of data-related activity increases; CDOs and CAOs are on the way up on the corporate ladder.

Information security, the security of IT assets, data, and operations, is a modern form of risk management and mitigation. This includes safeguarding information about the organization and its clients (customers).

Example: Equifax Data Breach

Apache Struts is a web application framework to develop Java applications. On March 7, 2017, a patch was released for Struts to fix a security vulnerability. The national credit bureau company Equifax had not updated applications on its website when the vulnerability was exploited on its site on March 12, 2017. The hackers accessed information about Equifax employees and used their credentials to scan databases and extract personal information on over 160 million citizens. The breached information included names, birth dates, addresses, social security numbers, and driver’s license numbers.

While it took only five days from the release of the security fix to an exploit, it took Equifax more than two months to discover the breach and it took until September 2017 to disclose the breach. A week later the Chief Information Officer and Chief Security Officer left the company. Hundreds of lawsuits were filed against Equifax after the breach became public. In 2019, Equifax settled with the Federal Trade Commission for $575 million, $475 million in restitution and $100 million in fines.

After the Equifax data breach, the Chief Information Officer of a major international bank called Apache Struts “the CIO killer.”

Read more about the Equifax case here including the criticism of the company due to inadequate response, fumbling notification websites, and executives selling stock before the breach was made public.

51.2 Model Risk Management

Data breaches are not the only source of data-related risk to be mitigated. You can now add the risk from applying models trained on data to the list—models can be incorrect, and they can be misapplied and misused. In financial services, model risk management (MRM) has been a staple for years.

Definition: Model Risk Management (MRM)

Model risk management is the management and mitigation of risks that arise from the consequences of decisions based on models that fail to perform.

In general, MRM is concerned with risks from the consequences of decisions based on incorrect, misapplied, or misused models. The FDIC (Federal Deposit Insurance Corporation) provides guidance to banks about effective model risk management. Banks are relying routinely on models for valuations, underwriting credit, determining adequacy of reserves, safeguarding client assets, etc. The use of models invariably presents risk that can lead to financial loss, poor decision making, and damage to reputation. The FDIC notes two primary reasons for model risk:

Models may be fundamentally wrong due to incorrect assumptions, underlying theories, choice of sample design, choice of numerical routines, implementation errors, inadequate or erroneous inputs, and so forth.
The model may be fundamentally sound but is used or applied incorrectly. Real-world events might prove the underlying assumptions of a model inappropriate—e.g., a global pandemic. The model might be applied to situations for which it was not developed, for example, to new products or markets.

We recognize in these points the concepts of data drift and model drift discussed early on. It is also clear that the concept of model risk management is not germane to the financial service industry—it applies everywhere. A key principle of effective MRM is to challenge models effectively, testing and validation play an important part. When models are developed and tested under assumptions that do not reflect their use in real life, model risk increases dramatically. This is what happened to Microsoft Tay.

Example: Microsoft Tay

Microsoft released in 2016 a Twitter AI chat bot, called Tay (as in Thinking-about-you) that was trained to behave like a teenager in interactions with Twitter users and learn from them. Within less than a day online, Tay had turned into a hateful racist, claiming in offensive tweets that, for example, “Hitler was right”, recommending feminists should all die, and denying the Holocaust. Microsoft blamed Tay’s responses on trolls teaching the AI bot deliberately inflammatory content.

Tay was a successful AI project in that the bot learned how to communicate based on who it was communicating with. But the internet can be a dark and awful place and Tay learned awful things from the people that engaged with the bot. The bot was not a successful AI project in that it did not incorporate content moderation and boundaries. The developers probably assumed that users on the internet would interact with the bot in the way the developers were interacting with it.

When systems can learn, how do we make sure that they do not learn bad things? Is it ethical to limit what a system can learn? Or is it unethical to expose users to what a system can possibly learn?

Microsoft did not break any laws with Tay. The bot was unethical in that it was developed without safeguards that prevented offensive and hurtful responses. AI bots are highly complex pieces of software; the unintended consequences arose from Tay still being good enough for the intended application. It probably worked very well in a sheltered test environment with benign interactions.

Microsoft CEO Satya Nadella gave the Tay episode credit for how Microsoft is approaching AI today.

The reasons for model risk stated by the FDIC are somewhat obvious: (i) the model is wrong, (ii) the model is correct but misapplied. We add another source of model risk: models that are trained well, almost too well. The following example reminds us that by merging, processing, and modeling data we can find out more than we should know. While it might not be illegal to have that knowledge, it is unethical to use that knowledge.

Example: Target Pregnancy Prediction

An angry father walked into a Minneapolis Target store and demanded to know why the company was sending his teenage daughter coupons for pregnancy- and baby-related items. He wanted to know if Target is trying to encourage his daughter to become pregnant. It turns out Target had indeed sent pregnancy marketing material to her, and it turns out that she was indeed pregnant–unbeknownst to her father at the time he called on the store.

It is disputed whether this actually happened or is an anecdote. What is not disputed, as reported by the New York Times, is that Target had developed a predictive model that assigns customers a pregnancy score based on demographic data merged with data from customers who had baby registries at a store, and a set of products that indicated a high probability of a pregnancy when purchased together.

By analyzing data of customers who had a baby registry, they developed a predictive model for all shoppers. The algorithm performed so well that Target could determine the trimester and estimate the due date. This enabled all kinds of targeted, personalized marketing efforts. From the New York Times:

… women on the baby registry were buying larger quantities of unscented lotion around the beginning of their second trimester. Another analyst noted that sometime in the first 20 weeks, pregnant women loaded up on supplements like calcium, magnesium and zinc. Many shoppers purchase soap and cotton balls, but when someone suddenly starts buying lots of scent-free soap and extra-big bags of cotton balls, in addition to hand sanitizers and washcloths, it signals they could be getting close to their delivery date.

Target did not run afoul of any laws, they used internal and public data available to them. They developed a model that performed well by statistical standards. But just because you can do something does not mean you should. You can learn something legally about someone and it can still be ethically questionable or even inappropriate to use that information.

Target customers were actually creeped out when they learned that the retailer knew intimate personal details about them. The company adjusted its marketing approach to be more subtle. The predictive models are still being used.

51.3 Unintended Consequences

The law of unintended consequences describes the phenomenon when an action leads to results that are not the intended purpose of the action. Typically, we think of the unintended consequences as detrimental; they can be beneficial a well.

Technology, all the inventions of the human mind, are full of unintended consequences. Comic Chuck Nice says that future technology comes with two things: promises and unintended consequences.

You can reduce the likelihood of unintended consequences—positive or negative ones—by deeply understanding the systems involved. Greater complexity likely leads to more unintended consequences as complex systems cannot be fully comprehended. Unfortunately, our capabilities through innovation and our ability to build complex systems increases more quickly than our ability to foresee the consequences. In his TED talk, Edward Tenner gives examples of unintended consequences from innovation throughout history. For example, the introduction of HVAC systems in the 1970s raised the temperature of water in these systems to the optimum for reproduction of Legionella bacillus, the bacteria causing Legionnaire disease—an unintended consequence. A bactericide was added to HVAC systems to prevent the disease. In the early 1980s it was observed that computer tape drives were failing in drives close to ventilation ducts. The bactericide contained traces of tin which destroyed the tape heads—an unintended consequence of the safety intervention.

When we build more complex and more opaque systems from data, we should expect more unintended consequences when working with data. While we develop the systems with the best of intentions, we should always ask “What could go wrong?”

Example: Facebook’s Year in Review

When Facebook first rolled out the Year in Review feature, it was intended to digitally scrapbook the previous year based on highly liked and photo-heavy content. Cheerful iconography surrounded images through which Facebook users were prompted to create their digital scrapbook for the year.

Author and designer Eric Meyer called it Inadvertent Algorithmic Cruelty in his blog, when Facebook invited him to celebrate the past year with an image of his daughter, who had died. He writes:

This inadvertent algorithmic cruelty is the result of code that works in the overwhelming majority of cases, reminding people of the awesomeness of their years, showing them selfies at a party or whale spouts from sailing boats or the marina outside their vacation house.

But for those of us who lived through the death of loved ones, or spent extended time in the hospital, or were hit by divorce or losing a job or any one of a hundred crises, we might not want another look at this past year.

The application is designed with an ideal user in mind, someone who is happy, had a great year, and would like to see a summary. It does not consider the worst-case scenario and lacks empathy for the far-from-ideal users. Meyer concludes:

If I could fix one thing about our industry, just one thing, it would be that: to increase awareness of and consideration for the failure modes, the edge cases, the worst-case scenarios.

It is easy to brush aside unintended consequences as not knowable, the so-called unknown unknowns. They will certainly remain unknowns if you do not want them to be known. Hindsight has always 20:20 vision, but some unintended consequences are in fact knowable and foreseeable. We have to make an effort to think about possible unintended consequences and be creative in imagining what could happen. Data sharing is an interesting case in point. Much good can be done by combining data sources that complement each other—the total can be more than the sum of the parts. Medical research can benefit by combining anonymized data from clinical trials that are conducted in isolation. Data collected in an app with user consent can be made available to a wider audience and make the world a better place—or it can backfire spectacularly.

Example: Strava Exercise Maps

Exercise app Strava opened up its data on Strava.com in 2018 to allow users to find new places to exercise. The move was well-intended. Strava had collected at that time data on more than a billion runs, walks, bicycle rides, and swims all over the world. For someone looking for a place to exercise that information would be highly valuable. Maybe you can find a trail or bike path in your neighborhood that you were not aware of.

However, the move turned into a privacy nightmare (Burgess 2018). Military researchers discovered that the heatmaps of exercise activity made it possible to study the workouts of military personnel around military bases. Although the data shared online was anonymized, it was possible to de-anonymize the data by requesting information for a specific geographic location. With this method it was even possible to see the names and heart rates of some individuals. If your daily run starts and ends at a military basis, then it is easy to infer that you are somehow connected to the military. By revealing common movements, someone with ill intent could use the information to figure out how a military basis operates.

The Guardian determined that by drilling into the data at Strava.com, they were able to reveal the names of 50 U.S. service members stationed at an airbase in Afghanistan. From changes in running profiles one can also glean when service members are transferred.

Revealing information about military operations was an unintended consequence of the release of information on Strava.com. It was not a data breach, it was done deliberately, with other goals in mind. According to the Guardian, the company argued that the information was already made public by the users who uploaded it. In hindsight, it was a foreseeable consequence of sharing geotagged data.

51.4 Measuring Performance

Should we accept a more accurate approach over a less accurate approach? Many would say “Yes”, greater accuracy is better. The more accurate method can have other shortcomings, for example, it may be inscrutable or very difficult to implement and the nod might go to the less accurate method that is easier to implement. In this situation we are making a deliberate choice to prefer one approach over another despite its inferiority with respect to a specific metric. Other considerations also matter.

All things being equal, accuracy is not necessarily the appropriate measure by which to judge the performance of a decision rule. The confusion matrix of a binary classification leads to many performance measures for the model, see Section 23.3.3.3.

Example: COMPAS Tool to Predict Recidivism Risk

Judges, parole, and probation officers are increasingly relying on algorithms to determine an offender’s propensity for recidivism—to re-offend. A higher likelihood to re-offend might result in a stiffer sentence. One such algorithm was developed by Northpointe, Inc. and is part of their COMPAS tool (Correctional Offender Management Profile for Alternative Sanctions).

Upon being booked in jail, a questionnaire of 137 questions is filled out, answered in part by the defendants or by pulling data from criminal records. Questions include things like “How often did you get in fights while at school?”, “How often have you moved in the last 12 months?”, and whether they agree with statements such as “Some people just don’t deserve any respect and should be treated like animals”. The questionnaire is known, but the algorithm that uses the answers to score defendants on the likelihood to re-offend within two years, is proprietary. Two risk scores are produces by the COMPAS tool: the risk of general recidivism and the risk of committing a violent crime in the future.

The idea of using algorithms to make key decisions in the legal process, not guided by conscious or unconscious personal bias, is appealing. An accurate prediction of recidivism can make the justice system fairer. But that requires that the algorithm works, it cannot itself be biased. Someone said that even a slightly flawed algorithm is better than a bigoted judge, but the consequence of applying even a slightly biased algorithm at scale, across the nation, in thousands of cases can be disastrous.

In 2016, ProPublica investigated the performance of the proprietary algorithm using data from more than 7,000 criminal defendants in Broward County, Florida (Angwin et al. (2016), Larson et al. (2016)). For these defendants the predicted recidivism and whether the individual actually re-offended was known. This allows the investigators to probe how well the COMPAS model is actually doing in predicting outcomes.

A notebook with data management and analysis using R and Python is available on GitHub, along with the data—made available by ProPublica.

The ProPublica analysis uses logistic regression and survival models to compare COMPAS scores among races and genders. The results include confusion matrices for recidivism classification (Low/High) and future violent crime (Low/High). Some of the important findings are:

51% of defendants were Black, 34% White, 8% Hispanic
81% of defendants were Male, 19% were Female.
Black defendants are 45% more likely than white defendants to receive a higher risk score when correcting for seriousness of the crime, previous arrests, and future criminal behavior. The risk score for future violent crime overpredicts recidivism for black defendants by 77% compared to white offenders.
Women are 19% more likely to receive a higher risk score than men.
Defendants under 25 are 2.5 times more likely to receive a higher risk score than middle-aged defendants.
The predictive accuracy of the COMPAS model is only 63% (as measured by a Cox proportional hazard model).
Only 20% of those predicted to commit violent crimes went on to do so.
The false positive rate of the model is 32.3% overall, 44.8% for Blacks, and 23.4% for Whites.
Under COMPAS, black defendants are 91% more likely to get a higher score and not go on to commit more crimes over the next two years.
COMPAS scores misclassify white re-offenders as low risk 70.4% more often than black re-offenders.
Black defendants are twice as likely to be false positives for a higher violent crime score than white defendants.
White defendants are 63% more likely to get a lower violent crime score and go on to commit another crime.

With an accuracy of just over 60%, the system is barely better than a coin flip, which would have an accuracy of 50%. The racial disparities in the predictions from the model are deeply troubling. In predicting who would re-offend, the algorithms made mistakes for Black and White defendants at similar rates but in very different ways.

	White	Black
Labeled higher risk for recidivism but did not re-offend	23.5%	44.8%
Labeled lower risk for recidivism and did re-offend	47.9%	28.0%

For Blacks, the algorithm has a high false positive rate (44.8%). For Whites, the algorithm has a high false negative rate (47.9%). When the algorithm predicts incorrectly, it leads to lighter consequences for Whites and to tougher consequences for Blacks.

51.5 Automation and Scale

We automate processes to achieve greater efficiency and productivity, to scale up operations, and to reduce time spent on mundane repetitive tasks. As someone explained,

when I do something for the second time, I think about how I did it last time. If I do it for the third time, I think about how to automate it and never do it again.

Scaling a process through automation is wonderful if automation makes the process better or if the outcome is inherently good. If automation of tax refunds means that refunds reach taxpayers sooner, no one will complain. If automation of traffic signals means better traffic flow through the city and less time spent staring at brake lights, grumbling is unlikely.

However, if automated decisions are of lesser quality, more biased and potentially harmful, then exercising them at scale can cause great harm.

A cruel example where this scenario played out is the former “Robodebt” system of the Australian Government.

Example: Robodebt Scheme of the Australian Government

Between 2016 and 2019 the government of Australia, through its social services agency Centrelink, wanted to save billions of dollars by recovering social security (welfare) overpayments to recipients. The process previously in place checked for compliance manually and resulted in about 20,000 cases per year. It was based on matching income reported in two different places: the tax office served as the system of record for annual income, and the self-reported bi-weekly income by the welfare recipient. If there was a discrepancy between reported incomes, a staff member would investigate and request that the recipient explain the difference.

An automated system, it was thought, could do this more efficiently, issue more debt notices, and hence recover more overpayments. The system would more than pay for itself. Such a system was put in place in 2016. In the first six months in operation, it issued already 169,000 debt notices. But that was not the only difference between the manual and the automated system.

While the manual (human-based) system would issue a notice to the recipient to please explain the difference, the new automated system, when it found a discrepancy between tax-reported and self-reported Centrelink income, would automatically issue a debt notice and expect repayment—hence the name “Robodebt”. The presumption of innocence turned into presumption of guilt. The onus was suddenly placed on the individual to prove that they did not owe the government funds—without human verification or interaction.

This is a horrible practice, unethical and possibly illegal, even if the overpayments are correctly assessed. But that is another way in which the automated system went astray. Robodebt had a staggering false positive rate of at least 27%! If you were a social security recipient who received the appropriate payment from the government, you had more than a 1-in-4 chance to be issued a request for overpayment.

Over the period of operation, 567,000 debts were raised through the use of the Robodebt algorithm.

After the government responsible for Robodebt was voted out of office, the incoming administration killed the program and announced that 470,000 wrongly-issued debts would be repaid in full. The error rate is over 80%!

In the article “The Robodebt strategy” in the December 2023 issue of Significance, Trewin, Fisher, and Cressie (2023) note the following main statistical flaws of Robodebt:

Using annual tax income to match against bi-weekly self-reported income. Bi-weekly income can fluctuate greatly throughout the year and cannot be simply compared against distributing the annual income evenly.
No documented understanding of error sources with the data or the data matching process.
No sensitivity analysis and inadequate testing of algorithms.
No understanding of error rates.
No involvement of professional statisticians.

A system that was supposed to save the government A$300 million cost it A$1.2 billion in repayments alone. The human toll was devastating: some recipients of debt notices had committed suicide.

The statistician Noel Cressie called Robodebt

a catastrophic program that was legally and ethically indefensible—an example of how technological overreach, coupled with dereliction of duty can amount to immense suffering for ordinary people.

He noted that Jensen’s inequality tells us that if the algorithm called out overpayments it should have also detected underpayments and issued credits (Cressie 2023). Robodebt never issued any credits.

51.6 Bad Data, Flawed Algorithm, Poor Implementation

Implementing a flawed algorithm poorly and feeding it bad data is a recipe for disaster. In the following example, that recipe led to the murder of victims of domestic violence. As a recent audit of the system suggests, at least some of these murders were preventable.

VioGén Algorithm to Monitor Gender Violence

Since 2007, Spain has been operating the VioGén system, a web application to help Spanish authorities to protect women and children from domestic violence. Based on an external audit by the Eticas Foundation and reported by Politico and others, the system is deeply flawed. (See here for the full report in Spanish).

The combination of numerous shortcomings, from bad data, to an opaque algorithm, to lack of human intervention, leads to an underestimation of the risk of violence. Women do not receive the proper protection, some are sent back into a domestic violence environment, and some have paid for it with their lives.

The underlying algorithm uses classical statistical models for risk classification, yielding a risk score based on the weighted sum of responses on an intake questionnaire. The system is designed as a recommendation system, assigning a “recommended” score to an individual with the opportunity for adjustment by a human. That human would usually be a police officer who runs the algorithm on the questionnaire answers provided by the victim of domestic violence. The Eticas audit revealed that in 95% of the cases the score returned by the algorithm is not being adjusted. Because of lack of human intervention, the recommendation system becomes the automated decisioning system.

The algorithm assigns a case to the risk categories “unappreciated”, “low”, “medium”, “high”, and “extreme”. Only one in seven women who reached out to police for protection actually received protection. Many women reporting violence receive an “unappreciated” risk score; 45% of the cases. Only 3% of women who are victims of gender violence receive a risk score of medium or higher. Furthermore, the system caps the number of higher risk scores at the level at which protection is funded in the budget.

Is the algorithm biased and is it underestimating the risk? Probably so. The algorithm focuses on physical violence and under-values psychological and non-sociological violence, for example, through social media. The lack of transparency of the algorithm does not help. The fact that only 35% of the women are told their risk scores is suspicious.

What is not in doubt are the flaws in the collection of data for the risk prediction algorithm. When a woman makes an official complaint, authorities present them with a standardized set of binary questions assessing 39 risk indicators as “present”/“not present”. Questions explore the severity of previous assaults (e.g., whether weapons were ever used), the features of the aggressor (jealous, bully, sexual abuser, unemployed, drug addict, etc.), the vulnerability of the victim (pregnant, foreign, economically dependent, etc.), and aggravating factors. The questions are blunt and direct, sometimes embarrassing. In the situation of the complaint, the victim of domestic violence is often terrified, under extreme duress, in a state of shock, emotionally charged. That is not the time to fill out a very consequential questionnaire. Police officers are reported to leave victims alone with the questionnaire and pressure them to finish it quickly.

The VioGén algorithm and questionnaire have been revised several times, the latest iteration seems to be in 2019. The Eticas Foundation audit was published in 2024. Recent efforts to improve the VioGén system aimed at introducing machine learning to first predict the probability of recidivism and then to adjust that probability based on indicators about the perpetrator. A former employer of mine is involved in this effort; see here and here.

51.7 From Revenue Management to Price Fixing

Because algorithms can make decisions at scale they are often said to be the key to drive harm and discrimination out of systems affected by subjectivity and bias of human reasoning. We have seen numerous examples in this chapter where algorithms, even if developed with the best intentions, can cause harm. Any positive or negative effects are amplified at scale.

Revenue Management Systems (RMS) are computerized decision systems that determine prices for goods or services. The goal is typically to optimize—read maximize—revenue for the operator of the RMS. Suppose two competitors operate in the same market, offering the same or a similar product. The consumer is likely to choose the lower-priced alternative, motivating the competitors to compete on price. The competition benefits the consumer. If, however, the companies agree to not undercut each other’s prices and instead raise prices in parallel, the consumer is at a disadvantage. Effectively, the competitors are colluding to fix prices above what a healthy market would lead to. This is called supra-competitive pricing.

The collusion can be indirect in that it can happen through a third party that computes the optimal pricing for a product in a certain market, when the competitors are all using the same pricing tool. The RMS is the pricing tool in this case, and when it is the dominant (or only) mechansim to determine prices we essentially approach monopoly conditions.

Example: IDeaS RMS for Hotel Management

A class action antitrust lawsuit filed in April 2024 in federal court in San Francisco alleges that SAS Institute, Inc., IDeaS, Inc., Hilton Worldwide Holdings Inc., Wyndham Hotels & Resorts, Inc., Four Seasons Hotels and Resorts US Inc., Omni Hotels and Resorts, Inc., Hyatt Hotel Corporation, and Choice Hotels International Inc. used price fixing to inflate the price of hotel rooms.

The agreement to fix prices might have been tacit, meaning the executives of the hotel chains did not get together in a room and agreed to manipulate prices for their benefit. Instead, the anti-competitive action resulted from all hotel chains using the same revenue management system and how that RMS operates. The RMS at the heart of the antitrust lawsuit is G3 RMS of IDeaS (Integrated Decisions and Systems, Inc.), a Minnesota-based subsidiary of SAS Institute. IDeaS was acquired by SAS in 2008.

Full disclosure: SAS Institute is one of my former employers, I worked there from 2002–2021 and towards the end of my tenure served as a high-ranking company executive.

This article by CBS News details how the IDeaS algorithm is alleged to support anticompetitive price fixing:

IDeaS’s clients, all competitors with each other, agree to supply IDeaS with proprietary, non-public, and sensitive information about their room availability, demand, and pricing. The information is provided continuously, in real time, so that IDeaS always knows what all of its client-competitors have available in a given market.

Like other revenue management systems, the vendors claim that they can optimize prices better than a person could. The suit alleges that the hotels participate knowing which of their competitors are part of the conspiracy, that they know IDeaS is collecting similar information from participating competitors and that faithful adherence to IDeaS’ price recommendations will yield greater revenue for all participants. According to plaintiffs,

by sending their sensitive confidential pricing and occupancy information to a third party to process, analyze, and develop supra-competitive prices, the [defendants] are able to achieve the same result as if they secretly met in a back room and exchanged their information and agreed to a supra-competitive price.

For another example how a revenue management algorithm leads to higher prices and how concentration in an industry can create anti-competitive conditions that put the consumer (here, the renter) at a disadvantage, check out the ProPublica investigation into the RealPage YieldStar software used to determine prices for rental housing.

The RealPage and IDeaS cases have things in common:

A proprietary optimization algorithm targets maximum revenue through modeling metrics such as price elasticity: how much does a change in price affect the demand? There is nothing unethical or illegal in modeling price elasticity, it is a common economic metric.
The models become very effective across sub-markets with access to large databases of market data, actual prices that consumers pay under known conditions. In a normally functioning market, competitors would not share that kind of data with other competitors. How you determine your price and based on what information is a trade secret.
The software vendors act as clearinghouses for data sharing across competitors. While they might not see each other’s data directly, the sharing enables pricing algorithms that yield higher-than-competitive prices that the client’s would otherwise not be able to achieve.
The market concentration of the participating companies creates anti-competitive pricing across many markets.