What Can We Do To Improve Mental Health Support In The Tech Industry?

According to the statistics provided by Johns Hopkins University, an estimated 26% of Americans ages 18 and older suffer from a diagnosable mental health illness in a given year. In particular, mental health concerns in tech entrepreneurship is oftentimes described as “founder’s blues.” Building on the brilliant minds developing innovative solutions to tackle various social and economic problems, people in the tech industry oftentimes struggle with meeting high expectations, a fast work pace, and a heavy workload on a daily basis.

On top of raising the public’s awareness of mental health and breaking the social stigma around mental health challenges, workplace culture plays an essential role in tackling this issue. Offering various resources and creating a workplace where people feel comfortable to share their struggles and open to seeking for help from colleagues, managers, and mental health professionals without the fear of shame or judgment would be extremely helpful and meaningful.

Image from Science Focus

Data Cleaning and Preparation

I found a dataset on the website: https://osmihelp.org/research, which contains OSMI Mental Health in Tech Survey responses from 2017 to 2019, including 1,525 records. Since survey responses tend to be relatively messy and contain many missing values, I did some data cleaning and preparation before further analysis.

In order to identify the key factors that have impacts on whether one would seek help for mental health challenges from professionals, I decided to use the binary variable, “treatment,” as my response variable, which indicates whether one has ever sought treatment for a mental health disorder from a mental health professional. The rest 23 variables are the independent variables in my classification model. However, I do want to emphasize that my dataset may have a selection bias since the people who chose to voluntarily participate in this OSMI Mental Health Survey would probably have been managing their mental health concerns or at least been aware of this issue, which could skew the results of my data analysis. Many people might have experienced mental health challenges but never paid attention or managed them for various reasons, which would not be reflected in my data analysis.

In order to more effectively assess the prediction accuracy of each model and overcome the issues of overfitting and underfitting, I first randomly split my dataset into a training set that contains 70% of the data and a test set that contains the rest 30%. Next, I used the Lasso shrinkage method to conduct feature selection. Using the variables selected by Lasso, I used three post-Lasso classification models, which are Logistic Regression, Decision Tree, and Random Forest.

Modeling and Evaluation

Logistic Regression came out quite nicely as it gave me a prediction accuracy of 0.859 based on the confusion matrix on my test dataset when setting 0.5 as my classification threshold. It also gave me relatively high interpretability based on the summary of the model. When looking closely at the signs of each variable, I found that people who identify themselves as male are less likely to seek help as opposed to female when holding other variables constant. People who identify themselves as mixed-race have a higher chance to seek help for their mental health struggles compared to other races. Additionally, when employers provide mental health benefits as a part of healthcare coverage or offer resources to learn more about mental health disorders and options for help, employees appear to seek more such help as well. Furthermore, when giving a higher rating of the willingness of sharing with friends and family that they have a mental illness, employees tend to be more willing to look for mental health advice from professionals compared to people who gave a lower rating.

Considering the interpretability and potential of helping to identify actionable insights, I used Decision Tree on my training set as well, which gave me a prediction accuracy of 0.864 on the test set. As shown in the plot below, the past and current mental health status play a significant role in deciding whether one has sought help for mental health disorders, which makes sense based on our intuition. Interestingly, we can see that age, one’s willingness to share with friends and family their mental illness, and whether one has discussed their mental health with coworkers were also identified as some of the most important decisions to make when predicting whether one would seek help from mental health professionals for their struggles.

Classification Decision Tree Plot

On top of these two models, I also attempted to use one of the powerful “black-box” algorithms — Random Forest to tackle this classification problem, which did give me the highest prediction accuracy of 0.869 among these three models. One way to interpret the classification results and generate meaningful and actionable insights is to plot a variable importance plot as shown below, which ranks the variables based on the decrease of node impurities from splitting on each variable. According to the graph, one’s past and current mental health condition, age, willingness to share a mental illness with friends and family, the overall importance one’s employer places on mental health, whether one has a family history of mental illness, and one’s overall rating of the tech industry’s support for employees with mental health issues play relatively significant roles in predicting whether one would seek treatment for mental health disorders from a professional compared to the rest of the variables.

Variable Importance Plot for Random Forest

Lastly, I also conducted a simple text analysis on the last question on the survey, which asks employees to briefly describe what they think the tech industry as a whole and/or employers could do to improve the mental health support for employees. Based on the responses, I generated a word cloud in R to see the keywords that are frequently mentioned in this free-response question. As shown in the plot below, words such as “talk,” “awareness,” “open,” “stigma,” “help,” “time,” “resources,” and “culture” often occurred.

Word Cloud on Survey Free Response

Recommendation and Discussion

Companies in the tech industry could improve their mental health support for their employees according to the key factors that were identified in my classification model. They could offer more mental health resources and benefits and create a more open and supportive working environment to increase the percentage of employees that seek help from mental health professionals when experiencing mental health challenges. The results of my text analysis also echoed the key variables identified in the classification model and would give the employers a better picture of the things they could try to change. In general, employees believe that employers and/or the tech industry as a whole could help to raise awareness, be more open to the mental health struggles, break the social stigma, offer more opportunities to discuss and talk about these experiences, and provide more mental health benefits and resources. For instance, in the world of COVID-19, companies could try to organize some confidential virtual meetings where the employees would be able to talk about their past or current struggles with their colleagues who probably have or had some similar experiences. It might also be a good idea for the employers or managers to share their own stories in such settings to make the employees feel safe and supported to work in their organizations and open to share their mental health challenges.

Even though more and more people become aware of mental health challenges and have been actively managing their mental health, much remains to be done. Considering the stigma around mental health and people’s hesitance of openly sharing and coping with these challenges, companies should be very careful when deploying my data analysis results and making changes in their workplace based on the specific structure and condition of their own organization and what their employees feel most comfortable with and are looking for. There would not be a single solution or system that would work perfectly for every single organization in the tech industry. Another important issue that is worth noting is that the predictive power of machine learning algorithms and data mining analysis largely relies on the data that they are trained upon. For example, during the Exploratory Data Analysis, I noticed that my dataset disproportionally contains more records for males than females and more records for Caucasians than people of color, which can impair the algorithmic fairness and make the model prediction biased. Collecting more data from marginalized groups and minorities would be helpful to mitigate this issue, improve the predictability of my model, and generate more meaningful and actionable insights.

If you are interested, please feel free to check out my code on Github and I would love to have a conversation on mental health in the tech industry with you.

Data Analytics, Data Visualization, Data Storytelling, Theatre Production, Costume Design