kreizihorse - Fotolia
Social media analytics best practices tempered by privacy laws
Social media provides a fertile landscape of information and insight into consumer behavior, but collecting and analyzing that data carries all sorts of privacy pitfalls.
Social media data can help improve a company's marketing ROI by targeting consumer behavior, demographics, personas, sentiment and social trends, but the road to fulfilling these critical goals is fraught with potential privacy hazards.
Individual social media profiles tend to include personally identifiable information and other sensitive data, making it imperative that data scientists understand what they can and can't use and for what purpose. Issues surrounding social media data privacy can vary depending on the type of data collected and how it's used. Applicable laws and regulations, social media platform contracts, the data subject's expectations of privacy, the use case and the organization's values, all factor into decisions on collecting and using social media data.
"We ask clients three questions: What is the purpose for processing or using the data? What is your legal basis? Does it feel creepy?" said Rita Zurbrigg, an analyst at Info-Tech Research Group.
Not all social media data is equally usable, so it shouldn't be collected or used as if it were.
"Understanding what kind of data meets the depth of protection is critical," said Kiran Sreewastav, vice president of data management and architecture at Cognetik, a data science and analytics company based in Cary, N.C. "[It's important to understand] the different tiers of sensitive data and personal data and then build your practices around that."
Navigating the gray areas
Unfortunately, there's no simple black-and-white answer to the question of what social media data can be used. "It's incredibly contextual," noted Omer Tene, vice president and chief knowledge officer at the International Association of Privacy Professionals. "It depends on the location, applicable laws, the context of the platform and the rules and the kind of data."
Data practitioners not only need to be concerned about the data they collect, but also the primary as well as the secondary uses of that data and from whom the data is collected.
"There are a lot of aggregators out there scraping social media sites and the information they collect, they resell or they buy," Sreewastav explained. "That data gets aggregated, and enterprises get that second or thirdhand data, so you're not only protecting the integrity of the data you're acquiring, you have to understand who you're buying it from and what their level of compliance is."
Gartner predicted that, by next year, "the backup and archiving of personal data will represent the largest privacy risk for 70% of organizations," compared to just 10% in 2018. "Over the next two years," Gartner added in a report on privacy issues, "organizations that don't revise data retention policies to reduce the overall data held, and by extension the data that is backed up, will face a huge sanction risk for noncompliance as well as the impacts associated with an eventual data breach."
Regulatory climate
National and international data protection laws continue to proliferate. The Cambridge Analytica/Facebook fiasco, Equifax breach and Google's failure to disclose a Google+ bug are all examples of dubious practices from which lawmakers and regulators want to protect individuals. GDPR and the California Consumer Privacy Act (CCPA) are two oft-cited examples.
"The genesis and history of the CCPA and how it became so swiftly enacted are rooted in Facebook and Cambridge Analytica," said Cinthia Motley, director of the global data privacy and information security practice at law firm Dykema. "There's such an outgrowth after GDPR -- and even in the U.S. -- as the result of CCPA. Many states are already pending with copy-and-paste laws of CCPA."
Under GDPR through January 2019, European data authorities had reported more than 95,000 complaints from individuals who believe their privacy rights were violated, according to the European Commission's Data Protection Board.
The collection and use of social media data is legalistically complex. "The first question is, how is the data collected?" Tene said. "Is it collected through a partnership collaboration with the social media platform or scraped? If it's scraped, it raises a couple of legal issues [that are] a violation of terms of use because the platform typically prohibits scraping, and some even argue it's a violation of the Computer Fraud and Abuse Act."
Companies doing business in Europe scrambled to comply with GDPR before the May 2018 compliance deadline because the fines are so onerous. Other countries are following suit, essentially adopting GDPR or adapting portions of it to their laws.
"Personal information is considered a fundamental human right, and most laws around the world have that concept," Motley said. "Canada, like many other countries, has followed [GDPR]. Brazil is another one doing a complete GDPR adoption. China adds a criminal offense."
Data scientists are wise not to make assumptions about the legality of collecting or using social media data. Instead, they should seek the guidance of their company's legal department, general counsel, outside counsel or chief privacy officer. More fundamentally, companies should have solid data governance programs in place as well as data protection training for employees to minimize potential risks.
Just how reliable is that data anyway?
There are some stark differences between traditional enterprise data, such as transactional data, versus what can be found on social media -- the two most obvious differences being authenticity and veracity. For one thing, the source of social media data might not be obvious. Is it a person, a bot, a mercenary or a nation-state? Is the "information" real or fake? Even an actual person's profile can intentionally contain fictional data.
"It's not just what people say on social media; it's what people say has changed or what people say based on false assumptions," conjectured Dana Simberkoff, chief risk, privacy and information security officer at independent software maker AvePoint. "Is that valid data? I don't think so."
Social media mining is a combination of art and a different science, Sreewastav said. "[I]t's really taking social media data and inferring it the right way," she reasoned. "The quality of your inference is directly correlated with the outcomes you want to derive."
Two potential pitfalls are substituting inferences for data and algorithmic bias that can result in discrimination. "[T]here's a lot of noise and a lot of inaccuracy," Tene said, "so decisions made on the basis [of social media] data can be just outright wrong or biased against a type of population that's underrepresented or overrepresented. I think there are data quality challenges when you're dealing with enormous data sets like this."
GDPR requires a legal basis for collecting personal information. "One thing I like about GDPR is the idea of data minimization and purpose limitation," Simberkoff said. But good social media data governance involves a lot more than simply adhering to data collection and privacy regulations. "Make sure you know what information you're collecting, who you're collecting it from, what you're doing with it, who can access it, how long you're keeping it, where you're putting it and when you're getting rid of it," she added.