
In recent years, pharmaceutical companies have been working hard to support patient-centric drug development. Incorporating patients’ perspectives of therapeutics is forming an important part of drug discovery and development, for example, identifying side effects not reported in clinical trials or different disease manifestations among sub-populations.
A trend has been observed among patients to narrate their disease on social media. Sites like Twitter, Facebook, Instagram and Snapchat are a few examples of platforms which enable patients to share their experiences with illness, medications and clinical visits.
Social media can create a community-like spirit which is often desired in patient groups who benefit from support in others who have similar experiences. Research has shown that patients suffering from critical illnesses and disabilities often seek social media for self-help and to share their personal perspectives.
As a result, patient posts are often written in an informal language, containing hidden and valuable information that would not necessarily be disclosed to a healthcare professional.
It has been suggested that the insights gained by analysing social media data could be leveraged to support patient-centered drug development. While the vast and complex data sets pose a problem for manual data analysis, artificial intelligence enables automated and cost-effective processing, also known as social media mining (SMM).
SMM is a recently emerging field of research which entails the “extraction and analysis of data gathered from online forums, blogs, and social-media platforms to gain knowledge concerning specific communities, as well as their members’ perceptions and needs”.
One of the challenges with social media mining is that, unlike scientific literature, comments are written by non-expert users who do not necessarily follow appropriate grammar, or report accurate observations supported by evidence.
Sources of social media are described as large and noisy, consisting of unstructured, textual data. Hence, this can make the process of mining social media more complicated and challenging compared to scientific papers for example.
A typical SMM pipeline comprises five fundamental stages of extracting insights from data available within social media platforms: resource identification, data extraction, data preprocessing, data analysis, and evaluation.
All five stages remain relatively preserved across different applications, including data extraction relating to healthcare, disease experience and perspectives on treatment. By utilising innovative approaches like AI, the value of social media mining is becoming increasingly promising in the pharmaceutical industry.
While therapeutic efficacy is the primary goal for a drug developer, the safety profile of a product is of paramount importance. Social media mining has demonstrated, in a number of studies, how it can be used to support pharmacovigilance. The risk of adverse drug-related events can have a profound impact on monetary implications in the pharmaceutical industry.
Since 2014, side effects associated with the use of the Pradaxa drug (prescription medicine for blood clots in the veins of the legs and lungs), resulted in the loss of $650 million in lawsuit settlements. These risks are especially high when novel therapeutics are brought to market, hence avoiding such catastrophes is a significant priority for pharma companies.
A study several years ago investigated comments posted in a medical forum to identify reported adverse drug events. The team used natural language processing (NLP) to develop a system that extracted adverse drug reactions from the text. NLP refers to the automated computational processing of human language which can take new, unstructured data from any source and convert it to structured formats which AI and ML can then be applied to.
This is important for a number of reasons: (1) Patients may not feel comfortable reporting specific side effects during clinical trials (2) Different adverse effects may arise in sub-groups within a patient population – this could be ethnicity, sex or disease progression for example.
Although patients typically use medically oriented social media to describe drug-associated adverse events, their experiences can highlight new indications for existing medication. A well-known example is the drug Zolpidem, a medication prescribed for insomnia. Through social media and patient reviews however, it became evident that the drug was subsequently used for brain injury.
While the accuracy of social media reporting beneficial effects could be questionable, it has been emphasised that the value of drug repurposing and vast amounts of social media data makes it worthwhile to study this data and potentially identify drug repurposing candidates.
Patient recruitment of the target population is essential for the success of clinical trials, and social media platforms are being increasingly used to support this. One particular study used topic models to generate semantic features via feature transformation.
In data science, feature transformation simply refers to the process of modifying raw data but retaining the key information. The features were then used in a supervised machine learning algorithm to identify Facebook users who had been diagnosed with different diseases – this enabled recruitment to become more accurate at identifying a target population from which companies can recruit from.
Of course there are obstacles for pharmaceutical companies hoping to take advantage of the vast social media data valuable for mining. The rules and regulations with regards to access to the data of social media users are typically complex, but with the GDPR rules potentially changing for the UK in the near future, it could allow pharmaceutical companies to access such data more easily.
Charlotte Di Salvo, Editor & Lead Medical Writer
PharmaFeatures

Regularized models like LASSO can identify an interpretable risk signature for stroke patients with bloodstream infection, enabling targeted, physiology-aligned clinical management.

The distinction between AI Agents and Agentic AI defines the boundary between automation and emergent system-level intelligence.
PDEδ degradation disrupts KRAS membrane localization to collapse oncogenic signaling through spatial pharmacology rather than direct enzymatic inhibition.
Dr. Mark Nelson of Neumedics outlines how integrating medicinal chemistry with scalable API synthesis from the earliest design stages defines the next evolution of pharmaceutical development.
Dr. Joseph Stalder of Zentalis Pharmaceuticals examines how predictive data integration and disciplined program governance are redefining the future of late-stage oncology development.
Senior Director Dr. Leo Kirkovsky brings a rare cross-modality perspective—spanning physical organic chemistry, clinical assay leadership, and ADC bioanalysis—to show how ADME mastery becomes the decision engine that turns complex drug systems into scalable oncology development programs.
Global pharmaceutical access improves when IP, payment, and real-world evidence systems are engineered as interoperable feedback loops rather than isolated reforms.
This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Cookie settings