After spending 1.5 years in an MS programme at CMU, I’ve decided to move into an applied ML role in industry instead of continuing on to a PhD. I’ve had a lot of discussions with people about the case for and against a PhD, and I thought it’d be useful to explain my thought process in deciding definitively against a PhD. While I’ve tried to be objective in listing out the arguments for and against, at the end of the day the choice of which arguments outweigh the other is obviously a deeply subjective one, so I want to emphasize that the focus of this post is to explain why I decided against a PhD.
Experience at CMU
First, I want to talk quickly about my experience in the MS programme so as to provide some additional context. I joined the Master’s in Machine Learning programme at Carnegie Mellon University last fall (August 2017), immediately after graduating from the Indian Institute of Technology, Bombay with a Bachelor’s in Mechanical Engineering. (Why I switched from MechE to ML is a whole other discussion, and probably deserves a post of its own at some point)
The MSML curriculum involves taking 7 courses and one project (called a Data Analysis Project, or DAP) over 3–4 semesters. As the courses are not very relevant to this post, I’ll focus on my experience conducting research for my DAP.
I started working on my research project from my first semester, and I really enjoyed the experience overall. It was a project in collaboration with industry, and so it was very goal-oriented, with periodic updates and clear deadlines and deliverables. Having had very little ML exposure prior to starting, and having zero deep learning experience, I had to ramp up very quickly. I was lucky to have really great teammates and professors guiding me in the project. They absorbed most of the pressure and gave me the time and freedom needed to explore these new ideas at a pace I was comfortable with. As a result, I’m leaving CMU now with a positive outlook on research in general.
To start with, let’s look at the case for doing a PhD. If you want to get into academia long term, i.e become a professor, there’s really no other alternative. So I’ll focus in this section on arguments assuming that the PhD is a stepping stone to something outside academia, in which case the decision is a lot more debatable.
One of the most tantalizing aspects of doing a PhD is the intellectual freedom you have. Apart for some approval from your advisor, you essentially have the freedom to work with the people you choose on any topic that interests you. You have complete control over what you spend your time on, and this is one of only points in your career that your primary goal is just learning — not learning for the sake of contributing to some organization, but learning for your own personal growth. That’s a powerful sort of control that’s hard to come by in any other line of work.
If you’re looking to go into an industry research role, a PhD gives you a significant boost. There are labs and startups that hire PhD’s almost exclusively, and even among the others, having a PhD is, as you’d expect, hugely beneficial. And beyond these, a common theme that I’ve heard from people who’ve completed PhD’s is about some of the less tangible skills you pick up. You build a sort of resilience to uncertainty, a unique kind of problem solving ability that is hard to pick up anywhere else, which plays an immensely positive role irrespective of your career path post-PhD.
These and other similar points are summarized in this excellent post by Andrej Karpathy.
I’ll start by getting the ugliest one out of the way first: money. A PhD represents an immense financial opportunity cost, especially in fields like CS. I strongly believe that the quality of your work, including your role, work culture and team are vastly more important than pay. There is a threshold beyond which your exact compensation hardly matters at all, and I believe that threshold is crossed fairly early in tech jobs. But I don’t need to argue very strongly to convince anyone that the typical PhD stipend falls well below that threshold. You’re likely to catch up fairly quickly with your first post-PhD job, but these 5+ years of diminished financial capacity (and those being some of the prime years of your life to boot) have to be taken into account.
Now that’s out of the way, let’s look at what is probably my primary reason for passing up on a PhD. I touched on it briefly before, but I am a strong proponent of having a good work/life balance. I also believe that it is very difficult to maintain this balance during a PhD (at least for me personally). I talked earlier about the kind of control you have over your life during the PhD. This is a double-edged sword however. While you potentially have the freedom to work at your own pace, there is an implicit expectation involved about how much work you do. The fact that you have no formal work hours means, ironically, that all hours are work hours. And every time someone messages you at midnight about a bug in your code (I’ve been on both ends of this one), every time you pass up on an outing because of all the work piled up in your office, every time you give up some sleep to get your script running, your mental stamina takes another hit.
It becomes easy to get caught up in the idea of the stereotypical grad student, mashing away at their keyboard with headphones on to block out everything with a handy cup of coffee next to them to get them through the day (or night). I think this notion is naive at best. Especially close to various deadlines, you lose your grip on vital things like good diet, sufficient sleep and exercise. The slip from doing this only at deadlines, to doing it all the time is far too easy. I’ve been way more cognizant about keeping my physical and mental health up than compared to undergrad, but I still fell into this trap too many times.
The key to successfully navigating through a PhD without burning out is, I believe, demarcation. You need to be able to draw the line between research and the rest of your life. That line can come in many forms. Maybe for some people it’s a sharp demarcation — you don’t work beyond X pm as far as possible, or you take Saturday’s off. Sometimes it’s blurrier, but you ensure that you keep your priorities straight by allotting enough time to the things that really matter. For me personally, I find it very hard to put up this artificial demarcation. I find it close to impossible to relax when I know there’s work that needs to be done that I could be working on right now. Any breaks that I force myself to take (say, by taking a day off) are typically filled with guilt and that nagging mental itch telling me to get some work done, which defeats the very purpose. This is a deeply personal issue, and most of the PhD students who I’ve interacted with at CMU seem to have a good hold over the issues that I talked about earlier. But I just don’t believe that this atmosphere where work seeps into every aspect of your life is one that I can thrive in.
I was tilting away from doing a PhD from the time I started my Master’s, but the final nail in the coffin came during my internship at Twitter this summer. After 7 years of rigorous schooling (2 years of IITJEE prep, 4 years of undergrad and 1 year of grad school), the ability to come home from work and switch off was incredibly refreshing. It wasn’t even something that I realized I was missing until I came back for my final semester of the Master’s after the internship.
So that pretty much ruled out a PhD for me. But I had to reconcile this with one equally important fact — I had enjoyed my first real attempt at research. Perhaps it was my teammates, perhaps it was my advisor, or perhaps the topic itself, but the most enjoyable parts of my academic life at CMU were always the projects (both my core research project and smaller course projects). I haven’t decided if it’s something that I want to pursue as a career (over, say, the applied ML roles that are more common in industry), but it’s definitely a possibility. But if I want to do research, I’d have to come back for a PhD eventually anyway right?
The Democratization of AI Research
You don’t need to look far to find a number of articles about “Democratizing” AI — making it more accessible to people, bringing it into the hands of individuals from different backgrounds, smoothing out the learning curve, providing more learning resources. Andrew Ng, a pioneer of this revolution, is known for calling AI the “new electricity”. I’d like to argue a point that is related, but subtly different — I believe that there is a visible trend towards the democratization of AI research as well.
Firstly, it is not only courses and basic tutorials that are freely available. As most machine learning research tends to congregate in popular conferences (NIPS, ICML, ICLR, CVPR and others), there is typically no paywall to accessing state-of-the-art research. Most of these conferences make their publications freely available, avoiding the journal pricing issues that plague many fields. And this is without even accounting for arXiv, which has pushed the trend of freely available research to its logical extreme.
Accompanying this trend of freely available research ideas, is freely available code. The ML community has taken a firm step to combat the reproducibility crisis that many fields are struggling with, by implicitly encouraging the release of data and code along with publications. You’re often one “git clone” away from playing around with the bleeding edge of the field.
This is supported by the availability of computing power. It’s not as easy to get your hands on your own GPU as it might be to read an online research paper or clone a repository of code, but AWS/Google Cloud/Azure all provide viable alternatives to buying your own. For a pretty reasonable price, you can set up your own GPU server installed with the latest versions of your favourite ML frameworks and toolkits with the click of a button.
And finally, getting your research out into the world is now much easier thanks to arXiv. As others have pointed out, this comes with certain consequences — the most prominent one being a lower signal to noise ratio. But I believe that the net impact is very much positive.
So what does this add up to? Let’s say you’ve just decided to explore ML as a field. You can start out with the plethora of freely available online courses to get a basic idea of what ML is. You then move on to research papers, reading up on the most interesting recent advances in the subfield of your choice. You decide to dig deeper into one particular research problem, and you download code from a relevant paper and get it running. You train a state-of-the-art model on an AWS GPU server. You then start playing around with it, tweaking parameters, implementing your ideas. You’re not able to beat the SOTA, but you think that some of your experiments will be useful for future researchers, and you decide to put up a blog post or a report on arXiv explaining what you worked on.
Just let that sink in for a minute. Given the right level of interest, you can go from just exploring the field to being a part of the research community in a matter of months. This is true in general for other fields within CS, but I believe for ML in particular, the barrier to entry into research is astoundingly low. The confluence of readily available resources, data, code, research and computing power has led to the almost blindingly rapid progress in ML (specifically deep learning) in the last 5 years.
Many of the top industry research labs are starting to recognize this trend, and are supporting it using various residency and fellowship programmes that are meant to encourage people from a wide range of backgrounds to get into ML research. The most prominent of these is the Google AI Residency programme, which invites applicants from a variety of STEM backgrounds and assigns to them some of the leading researchers in the field as mentors. The residents are then given the freedom to explore and research ML topics that interest them, with the aim of preparing them for a research career in academia or in industry.
Facebook AI Research has recently announced their own residency programme, which appears to have a model similar to the Google AI Residency. The first batch of residents begin next year, and are similarly mentored by senior researchers and allowed to explore AI research. OpenAI has their own Machine Learning Fellowship, again targeting people who are getting into deep learning research by providing mentorship and training.
The requirements for these residencies are telling. The target is typically people with a solid mathematical foundation who are interested in deep learning research, but do not have, say, a PhD in ML. The sheer diversity of past Google Brain residents is a testament to the industry’s attempt to extend the opportunity to perform solid ML research to interested individuals from various backgrounds.
To drive the point home further, here’s another very recent project — the Artificial Intelligence Open Network. It aims to help senior researchers in the field “outsource” their research by bringing together interested people from across the world. It’s potentially a win-win situation — experienced researchers with a backlog of projects can give younger researchers an opportunity to learn and contribute to the community, and both get due credit in the process.
“Decentralized research” sounds almost like an oxymoron. Research has historically been conducted by close-knit groups of highly qualified people working in dedicated organizations or institutions. The idea of organizing a large group of people from around the world towards solving a directed research problem is an innovative one, and certainly one that is only possible in a field like AI, with the free access to resources, computing and data that I talked about before. We’ll need to wait to see how successful it is, but being backed by people like Francois Chollet, Gabriel Pereyra and Hugo Larochelle, the idea seems to be in good hands.
To conclude, I believe that the vast majority of people (myself included) would not thrive in a PhD programme. Given this setting, you need to start your decision making process with a slight negative tilt. You start with “I shouldn’t do a PhD” and then you gather enough evidence to convince yourself otherwise. In this process, I actually ended up gathering more evidence to support the initial hypothesis than to reject it. I’ve had to make some difficult decisions in the last 5 years of schooling. Deciding not to do a PhD was not one of them.
Originally published at https://deepakdilipkumar.github.io on December 18, 2017.