arthead - stock.adobe.com

9 GitHub Repositories Found Leaking Health Data from Over 150K Patients

A new collaborative report from Jelle Ursem and DataBreaches.net found nine GitHub repositories were routinely leaking a trove of protected health information from at least 150,000 patients.

Improper access controls have left the data of more than 150,000 to 200,000 patients, and likely more, exposed online in at least nine GitHub repositories, shining a light on the need for improved vendor management and security processes, according to a new collaborative report from Jelle Ursem and DataBreaches.net. 

Ursem, a security researcher from the Netherlands, has examined online databases for over a year and a half, which has led to the discovery of more than 400 data leaks from a host of companies, including private sector entities, Fortune 500 companies, and government agencies, among others. 

The new report, No Need to Hack When It’s Leaking, sheds light on this trend as it pertains to medical data. Ursem found protected health information leaking through GitHub from:

  • Xybion, a software, services and consulting company  
  • MedPro, a medical billing and management services vendor  
  • Sumana Ketha, M.D., who owns a range domains and entities in Texas, including Texas Physician House Calls  
  • VirMedica, e-access technology solutions vendor  
  • MaineCare, state and federally-funded healthcare provider  
  • Waystar, a revenue cycle management solutions vendor  
  • Shields Healthcare Group, a New England-based MRI provider  
  • Acc-q Data Network, a medical billings service provider 

The report mirrors earlier findings from IntSights, which revealed one-third of healthcare databases stored both locally and in the cloud are currently leaking data online. But what's even more concerning is that reports show misconfigured databases can be hacked in less than eight hours

GitHub repositories are notoriously abused by cybercriminals. By 2016, threat actors were leveraging password-reuse attacks to exploit information stored on GitHub to gather data helpful for later hacking in other attacks. The most well-known occurred with Uber and Lynda.com. 

Ursem sought to discover whether these repositories were also leaving medical data vulnerable. And it took him just 10 minutes to find medical data exposed on GItHub, using simple phrases like ‘companyname password’ or‘medicaid password FTP’ to find potentially vulnerable, hardcoded system credentials. 

The discovered credentials could belong to a database, a Microsoft Office365 account, or a Secure File Transfer Host, and Ursem would attempt the login on “the right software and hit connect,” allowing him to log into the system through the front door. 

“Once logged in to a Microsoft Office365 or Google G Suite environment, Ursem is often able to see everything an employee sees: contracts, user data, internal agendas, internal documents, emails, address books, team chats, and more,” according to the report. 

“Just think about what that would mean if someone is able to snoop around as if they were your employee without ever triggering any alarms,” it continued. 

The newly discovered medical data leaks were most commonly caused by developers through several means, including embedded hard-coded login credentials in code rather than a configuration option on the operating server, using public repositories, failing to use two-factor or multi-factor authentication for email accounts, or abandoning a repository instead of deleting the data after it’s no longer needed. 

What’s more, many of these errors went undetected by the entities for months or even years, due to the companies failing to audit the compliance and security policies of their developers, a lack of monitored accounts for researchers to report security concerns, or failing to respond to responsible disclosure attempts. 

For exa`mple, the Xybion leak was found in February 2020 and caused by a developer leaving some code in a public repository that provided system credentials. Combined with another exposed code, Ursem was able to gain access to the vendor’s billing back offices – including data from nearly 7,000 patients and over 11,000 health insurance claims. 

It took multiple attempts from Ursem and DataBreaches.net for the access to be removed. Further, it’s currently unclear whether the Department of Health and Human Services has been notified about the PHI breach. 

Meanwhile, the MedPro leak was caused by developer errors and went on for several years through an exposed SFTP server, a backup database, and an Outlook mail account. According to Ursem, it appears that once the system was set up, the mailbox was likely never reviewed again. It also appears that the firm’s email account was compromised by spammers. 

“Any credentials needed to access them could be found in the developer's public repository, where several (working) windows domain access credentials were also listed,” the report authors explained. “The earliest exposed files on the SFTP server appeared to date from 2015. The GitHub repositories have been online since 2016.” 

Notably, in the data leak of Texas Physician House Calls, the developer “unwillingly and likely unknowingly” uploaded and integrated malware into their codebase in two separate areas. The malware was a 'php webshell’ virus able to connect to a server in Ukraine. 

As a result, it’s likely all client data stored on the server since 2017 was compromised. 

Also notable was the VirMedica leak, which was first discovered in February. VirMedica was acquired by CareMetx in 2019. The code was initially uploaded to GitHub in 2018, as well as FTP login credentials and large .csv, and Excel files. 

PHI from about 40,000 patients was also impacted in the leak, such as demographic details, diagnoses, health insurance information, provider details, and other sensitive data. Counsel for VIrmedica informed the researchers that their investigation confirmed only Ursem’s IP address had accessed the files. 

The MaineCare compromise impacted a trove of patient data, as well as the developer, including VPN login credentials, design documents, domain accounts, help desk credentials, and mainecare.maine.gov production usernames, passwords, and locations, which provide PHI access after Ursem logged in. 

The report provides further detail into these breaches, including details into developers that are repeat offenders, as well as issues with notifications and responses to responsible disclosures. Healthcare entities should review the report to avoid similar errors. 

Further, the report also provided simple steps for developers to avoid similar mistakes, including the need for deploying IP address whitelists, enforcing password resets, and providing responsible disclosure mechanisms. 

Healthcare entities must improve auditing of their business associates, vendors, and developers, as well as responses to responsible disclosures as “at least three of the nine entities intentionally did not respond to early notification attempts and would later claim that they had been fearful the notifications were a social engineering attack. Their failure to respond left PHI exposed even longer.” 

Notably, many of these entities were outsourced or contracted developers. 

Employees must be trained to be the first line of support on procedures necessary to escalate notifications received from researchers. And administrators should routinely search GitHub for their firm’s name and domain names, as “even if you do not use a developer, one of your business associates or vendors might.” 

“The number of threat actors misusing or attacking GitHub repositories is anyone’s guess, because we suspect that no one actually checks or audits logs unless something makes them urgently aware of the need to check their security,” the report authors wrote. 

“But attacks are only one risk entities face. Perhaps the more pernicious risk is the risk of leaks that go undetected but may be capitalized on by threat actors or those who would hack but lack sufficient skills,” they continued.

Next Steps

Dig Deeper on Healthcare data breaches

xtelligent Health IT and EHR
xtelligent Healthtech Analytics
Close