Banks cannot ask new customers about their race, gender or religion on credit applications. This is designed to prevent discrimination. But for regulators who check to make sure banks do not discriminate, this creates a different problem.
The easiest way to check if a bank is racially discriminating is to compare people of different races and if they’ve been treated unequally. But when the data about race does not even exist, regulators have to guess.
“[The lack of data] puts them in a position where they have to measure disparate impact, but they have a hard time measuring disparate impact because they don’t know who is who,” said Prof. Nathan Kallus, operations research and information engineering, co-author of the study.
“Disparate impact” refers to a disparity in experience for two similarly qualified groups who differ in characteristics like race and religion.
As the assessors do not know the race and gender of the applicants, they use an algorithm to estimate race based on names and locations. However, such algorithms are not necessarily accurate, posing problems for regulators and the banks they oversee.
Now, these algorithms are coming under scrutiny after a study by Cornell researchers found that small changes in evaluated factors could lead to big differences in the estimated levels of discrimination of lenders.
Addressing this issue, a team of Cornell researchers recently published a paper that examined the accuracy of machine learning algorithms which measure the extent to which banks discriminate.
“It’s a very serious problem because now [neither] the banks nor the regulators have access to the race label but they still have to do the assessment, they need to assess whether the banks are complying with the laws,” said Xiaojie Mao grad, lead author of the article.
In the industry standard method, the proxy method has cost banks millions in fines. In 2013, regulators fined a bank for about $100 million, Mao said.
The proxy method takes into account two factors to determine the race of applicants: name and geolocation. Using these two characteristics, regulators can set the algorithm to identify the race of applicants to various probabilities.
To figure this out, researchers tested the algorithm in the only financial industry where racial data is readily available: mortgages. By testing the estimate produced by the proxy model against real data, they found that the algorithm tended to overestimate the amount of discrimination that occurred.
Going forward, Mao wishes to see smarter ways to collect people’s racial data in lending applications. However, he understands that people are reluctant to supply such data for fear of it being used to discriminate against them in the first place.
“Regulators and banks should collaborate together to find a smarter way to collect that information without intruding on people’s privacy, without violating the laws,” Mao told The Sun.