Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
How to Get a Job in Deep Learning (deepgram.com)
316 points by stephensonsco on Sept 22, 2016 | hide | past | favorite | 86 comments


TL;DR: Deep Learning will become a commodity. Software will eat Deep Learning too.

I'd like to clean up a bit the air from the hype fog:

DL is giving amazing results only when you have big sets of labelled data. Hence it will be much cheaper for companies to buy Google/Microsoft Vision/Audio REST APIs rather than paying the costs of: cloud + find data + deep learning experts. So, I don't think we will see a massive growth of DL gigs.

e.g. Google Vision API: https://cloud.google.com/vision/

Except those areas where your own CNN implementation is needed (automotive, industrial automation), Deep Learning will be another "library" in the ever increasing Software Engineering mess of gluing many open source libraries and REST apis to get something useful done. You need 1 guy training a Neural Network for every 100 software monkeys maintaining the infrastructure complexity. There are now many Software Engineering jobs because it's hard to glue and maintain publicly-available code to solve some specific business problem.

I think the the same applies for many Data Scientist jobs, which are these days more about fetching/cleaning/visualizing data than making machine learning on it.


This is also one reason why Tensor Flow is open. Yes, Google wants it to become the standard, but it's also not a competitive advantage.

The advantage is not the Deep Learning algorithm.

Some theoretical progress has been made on Neural Networks lately, but largely it's the same stuff from the 90s, with much more GPUs and data.

The competitive advantage is the cloud, and the software mess that keeps it alive.

I think Deep Learning experts will be like Linux Kernel experts. You need 1000 kernel experts in the world, but you need 10 million javascript monkeys that code what dialog message appears when the user does something stupid in some app.


Or backend monkeys that code what JSON to serve when the UI is managing the user interaction and using the server as dumb data source/sink...


Good reply. At first the sarcasm (usage of monkey) was almost lot on me. GP seems to have a very condescending attitude towards high level programmers. How would they feel, if he is called a manager monkey, a deep learning monkey. Or simply a monkey monkey.


Compare our industry's attitude to how doctors and architects are treated.

Does anyone call a doctor a "diagnosis monkey" because she isn't in the business of inventing antibiotics? Is an architect a "wall-placement monkey" because he doesn't create new kinds of construction materials himself?

In the software industry, professionals who are just as important as doctors and architects routinely think of themselves as mere monkeys or data plumbers at best -- and the reason is this never-ending cycle of nerd hazing.


>routinely think of themselves as mere monkeys...nerd hazing

This phenomenon is also there. But I think, more prominent is the derogatory usage by others, especially when coining terms like <xyz> monkey. For that matter, I am Okay with critiques of tech/practices e.g. Javascript. But definitely don't like derogatory usages like calling the practitioners JS monkeys. Which was the case here. Difference between attacking techs/practices and people.


When I started working in IT, those monkeys were everyone that couldn't write Assembly....


Just to give a different perspective. We are an on prem shop that sells to banks, telcos etc who can't use the cloud for compliance reasons. We make it work by doing other things people need in those environments well. One example is my willingness to do an install via dvds rather than "the cloud".

We mainly do fraud detection and security related work. We have also seen operations workloads (forecasting when machines are going to break or preventative maintenance) Most of our core business isn't even CNNs.

One thing that's missing from the narrative is that researchers who vision and speech because it's "pretty" and you can demonstrate results on a large feature vector. It's also very relatable for normal people.

Most of corporate america (not silicon valley) has more traditional things like time series data. Not images.

I would say here that deep learning use cases aren't explored by most people and that there are other areas besides what the marketing with self driving cars is perpetuating.

The other thing I would add here is when companies get bigger they typically need to take core competencies like vision and speech in house. It can be hard to justify outsourcing your core business to google as you grow.

This trend however might change over time. I would love to hear contradictory statements here.

Disclosure: I am a deep learning founder in the same batch as the authors of this blog post.


If you were to use deep learning in-house for smallish problems, would you still use tensorflow plus a batch of GPUs, or something different? I can't take stuff to the cloud for regulatory reasons, and in order to contract w/a startup like yours, I would first need to make a case for it (with useful examples that just need to be improved)


I am the wrong person to ask. We compete with tensorflow going after the larger scale stuff where hadoop is a requirement and the cloud is a swear word.

I work with .net and java shops where running on windows is a requirement and hippaa is considered "lite" and they dont know what a gpu is.

Despite the marketing noise that is still most of the world.

Look no further than another comment that showed java job postings vs machine learning.

My biased comment out of the way: if you can convince ops on why you need this deep learning thing then be my guest. Their first question to you is probably going to be: Does dell/hp sell this as a reference box?

It mostly comes down to roi and what your existing stack is.

My job is hard enough as it is. If you can convince IT to allocate an r&d budget be my guest. That is more or less our specialty. Email is in profile if you want specifics.


Yes, that might work fine, depending on the scale of your problem. Grad students do interesting work with small setups, and companies can too.


"One thing that's missing from the narrative is that researchers who [<--verb missing here-->] vision and speech because it's "pretty" and you can demonstrate results on a large feature vector."


Sorry! Was writing this quickly. Good find.

"do research using vision and speech"

Too late to edit.


For the people who want a job in deep learning, they are definitely around and they will be for a while.

It's true that DL will probably become just-another-library, but that will happen only once computing becomes extremely cheap on the petaflop scale (it isn't cheap yet). Even after that happens, the people that spend time doing DL now will be trained in a way of thinking that will be in demand for a long time.


There's also a possibly huge amount of possible applications for DL or ML in general, someone just needs to connect the dots. So it's not entirely limited to a finite number of big companies who will be hiring them. Just having that skillset (or enough knowledge of how it works) could create your own opportunities by going out in the world and finding applications.

Just like how there were countless excel spreadsheet business processes that birthed CRUD applications.


> I think the the same applies for many Data Scientist jobs, which are these days more about fetching/cleaning/visualizing data than making machine learning on it.

Well, creating ML algorithms is a small part of data science. And getting data, and making sense of it, is a highly non-trivial part (though, somehow underappreciated). Very often it requires a lot of knowledge, skill and experience in used methods, dataset and statistics in general.

Anyone can plug&play a scikit-learn algorithm within 2 lines of Python, yet it does not make everyone a data scientist. Anyone can copy&paste&run a deep learning network architecture, but without proper care it is likely to terribly overfit to underprepared dataset.


Sidenote: A common practice is to take a pre-trained model (e.g. on the Imagenet dataset) and only learn the last few layers for your usecase. This way you can get a well trained feature extractor if your task data is similar, and then only train the classification, which is a lot faster than full end-to-end training.


I am curious about demand for this skill in the market.

But I just don't see it - machine/statistical/deep learning gigs just seem really rare.

I know this isn't a great metric, but searches on Indeed.com:

  "deep learning" - 873
  "machine learning" - 9,762
  "statistical learning" - 65
  java - 72,802
  javascript - 43,785
Same searches on LinkedIn:

  "deep learning" - 646
  "machine learning" - 6,952
  "statistical learning" - 34
  java - 43,845
  javascript - 30,818
Even the "machine learning" search on Indeed, with 9K+ results has 1300+ from Amazon, followed by a much smaller number (in low hundreds each) from Microsoft, Google, others (including some that look like staffing companies).

Even on HN's who's hiring Sept 2016 thread phrase counts: 14 "deep learning" 79 "machine learning"

I completely agree with the idea that being able to use some deep/machine/statistical learning is going to be a toolset that data hackers need to have. I even think that there is a bit of the "build it and they will come" magic waiting out there.

But I think the best way forward is to be working in data and figure out how to generate value with deep learning - this will be much more productive than trying to seek out a deep learning gig in terms of promoting deep learning in the workplace. Heck, that's a suggestion I would be wise to take myself . . .


You can't only look at demand. You also have to look at supply. Most machine learning engineers have a masters at least (many have a PhD), and almost all 'scientist' positions require a PhD. Of course there are fewer deep learning jobs available than front-end developers. That doesn't mean they're not highly coveted.

As an example, one of Fei-Fei Lee's recent grad students got multiple job offers upon graduating, one for more than a million dollars a year: http://www.nytimes.com/2016/03/26/technology/the-race-is-on-...


That $1m offer refers to Andrej Karpathy. He's the Jeff Dean of Machine Learning these days.


They are certainly great within their fields, but one thing Kaparthy is particularly good at is social media. While in grad school he was quite active both on HN, Reddit, and Twitter. He had fantastic blog posts that made difficult subjects seem comprehensible. Perhaps it's just part of being part of a younger generation, but Jeff Dean doesn't share that feature.


Not sure where you are referring to but in most places of the world data scientists/engineers don't have nor require a PhD.

And more so recently with their being quite a significant demand for them in enterprises looking to establish big data analytics programs.


What did you search to get your numbers?

Here is a graph of something similar: http://www.indeed.com/jobtrends/q-%22Data-Scientist%22-q-%22...

There is definitely more demand for data science than deep learning, and much more demand for software engineering or development than data science. Of course the supply also matters, there are many more software engineers than data scientists. But still, deep learning is a niche skill.


Though they are not representative of the whole, if there's 1 ML job for every ~10 java/javascript jobs (are those really related?), that doesn't sound too bad to me as far as demand for ML. Many of those other jobs are probably not senior level jobs, and every ML job will be.


How do you figure every Machine Learning job will be senior-level?


The best metric is the salary that the position can command.


Maybe a better metric would be growth of these jobs over time?

It could be that demand for deep learning jobs is growing faster than ML jobs, or vice versa.


My question is, it feels like machine learning is reaching its "Rails" stage. You can implement the latest Bi-directional NN or LSTM-RNN using a high level API that already sits on top of another high level framework. Even beyond the core setup it will do the peripherals - smart initializations, anti-overfitting, split up your data, etc.

Do people who implement (albeit real, useful) deep learning systems, but who have no formal machine learning background, who don't really know much or care about implementing derivatives or softmax functions because the frameworks abstract all that away - are these people getting offered jobs?


Am I the only one who feels like all that math is required in order to properly implement an ML system? Bugs in ML systems are insidious and tricky, often difficult to find and manifesting in unexpected ways based on subtle issues in the underlying data or mathematics. Having a ground up understanding of why things work the way they do is a requirement to reasoning through all the layers of the system to find and fix issues.

Maybe someday all of that will be abstracted enough that you don't need to know math to do ML (like how now you generally don't need to know machine architecture to program), but how soon will that be?


It might be some time before it's abstracted enough. For example, there's software packages for Finite Element Analysis, but if you don't know the underlying math, you can't do anything more complicated than the basics. Plus, if there's a problem without that knowledge you can't really debug it. I'm guessing ML will be like that for a while. If you want to do something simple using a package is fine, but as soon as something goes wrong you'll need that knowledge to figure out what's happening.

For example, for a while there's been work on doing sentiment analysis using machine learning and they typically train them on a data set of movie reviews. It turns out that as soon as you apply that trained system to anything other than movie reviews the actual results are quite poor, but you might not catch it.


>My question is, it feels like machine learning is reaching its "Rails" stage

No, I don't think so yet, but even if it did, would it even matter? There would still be world of different between the teams that can build a Twitter/LinkedIn/Github in Rails to someone who knows vaguely how to string something together because they learned it on codecademy


A world of difference or a bootcamp and 6 months of difference? It's about barriers to entry, which many claim are insurmountable for mere common developers. A Rails bootcamp gets you close to being employable, and assuming you code, why not for deep learning too?


This sounds like you're comparing driving a car to flying an airplane. From my understanding, much like before flying, you need a lot of foundational knowledge to work with ML and be productive on your own. You could learn to drive a car, like you can learn to build a Rails CRUD app over a weekend, you'll be bad at it still but you can get to point A->B with little investment.

The barrier to entry for AI stuff is quite a high. And IMO will be limited to certain kinds of people who like working on the hard science /math stuff, and have a strong enough early education to learn advanced linear algebra and statistics (for example) .


Anecdotally I think flying is not much harder to do badly than drive a car badly.

In many ways flying itself is easier than driving because you have less chance of a collision.

The really hard parts are takeoff and landing, not sure if one could learn to takeoff and land a Cessna badly in a week long bootcamp.


> not sure if one could learn to takeoff and land a Cessna badly in a week long bootcamp

Yes, you can. Taking off is extremely easy, you just need to accelerate with the flaps down and pull up at around 90km/h (if I recall correctly, it has been many, many years since I last flew an airplane).

Landing is much more complex, but you can do decent landings[0] in normal weather conditions after a few classes. The course I did (as I said, many, many years ago) had 35 flight hours and I did it in a summer (2-3 classes per week, usually 1 hour of theory and 45 minutes of flying) and you were perfectly able to do the exam (and pass) after it.

[0] Of course, in the eyes of many pilots, a decent landing is one you can walk away from. A good landing is one where you can use the plane again.


Maybe I'm not exactly the case you are asking for, but let me tell you my experience:

I started college in '97 and specialized in robotics and AI. Dropped out during my fifth year (2003) for several reasons (some having to do with the university, others were personal/family related). My working experience begins in a startup in '99 doing web stuff.

Now, in my country and in the early 2000's there were no jobs in robotics/AI, so I continued doing web stuff and, through the years I've moved up and down the ladder (I was development director of a multinational, left for a senior developer position because I was bored), lived in 4 countries and moved into different sectors and technologies. AI/Machine learning/Robotics were a hobby and I tried to keep up to date[0], but I had very few opportunities to apply that knowledge in my day to day.

In terms of education, I'm a drop out: I went through five years of university and have more than enough university credits for a bachelor degree, but have never felt the need to go back and officially get a degree[1]. I started a master's degree in Ireland and left it after the second module[2].

Now, finally, the point I wanted to make (sorry for the long introduction, I felt it was important to give a bit of background):

About five-six years ago, I got my first official position working full time with AI and machine learning[3] and suddenly found that my maths knowledge was definitely lacking. Even today, although I'm always improving, I find that when talking about deep learning, the maths are definitely my weak point even though I'm able to implement models, systems and tweak them enough to get good results.

So, to answer your question: Yes, I keep being offered jobs. Maybe not as well paid as someone with a similar level of experience and a PhD and definitely not research-oriented. If the company does have a data science team, those jobs are usually working with them to take their models and integrate them with the rest of the product/s.

[0] I went through Andrew Ng, Sebastian Thrun and Peter Norvig courses originally through Stanford Online and I'm constantly following one or two coursera or MITx courses in the area. I also have an account in Kaggle but I don't usually push my results, I'm more interested in the datasets.

[1] I'd need to go back to my country and fight the bureaucracy there while trying to get documents from a university that no longer teaches computer science.

[2] I felt a bit cheated, to be honest: The syllabus looked great, the content was mediocre at best and the papers we had to do had very little to do with the stuff we learnt and even with IT in general. 4K euros per module (for an online master) and the time spent wasn't worth it just to get an official piece of paper.

[3] "Data science", I don't specially like that term and, if anything, I prefer to call myself a data engineer. I don't do science, I don't do research. I can take a research paper, implement it and add all the stuff that makes it useful and production-ready. But I don't call that science.


My advice:

Do not label yourself as a data scientist or machine learning expert. Go for the domain, i.e. become comfortable with the actual data and the methods used there:

- predict land use in aerial imagery - become comfortable with photogrammetry, geography, etc.

- predict biological tissue(s) - become comfortable with specific branches of biology or medicine

- predict $something_relevant

I actually stole this advice from the epilogue of some text about programming, and it really stuck with me. Otherwise your expertise is just too generic and you compete with a big pool of people who call themselves machine learning experts, because they can write a for loop in Bash.


>Speaking of math, you should have some familiarity with calculus, probability and linear algebra

Curious to know if anyone has had success learning/re-learning these as a mid-20s or older adult who works fulltime, and if you could potentially provide a list of books/courses to go through. I personally never learned anything past geometry (in high school). The most advanced math class I took in college was College Algebra. That means I never learned trig or anything past it (so no calc, linear algebra, or probability), and I'm sure most people on HN surpassed me math-wise sometime in high school :)

I've been able to skate by with my embarrassing lack of math knowledge/skills as a developer, but I feel like it's only a matter of time until the mathematical steamroller becomes a serious threat career-wise and I get crushed.


While I did take courses in probability, linear algebra, and lots of calculus, until recently, I forgot most of the probability and all of the linear algebra I learned in school. As for calculus, I only remembered how to take basic derivatives. In any case, I've been spending the past month brushing up on my linear algebra and probability, and it's been a struggle, but now that I'm motivated and under no time pressure to relearn the material, I find it way more fascinating than I did in college. In fact, I skipped tons of my linear algebra classes because I thought the subject was dry and dull. I also rushed through my probability and stats homework just so I could get a good grade on them. I think if you're motivated, and you can do basic math, you should be able to educate yourself in calculus, probability, and linear algebra. It'll be a struggle, but with motivation, you'll be able to pick up the concepts.

for probability and stats: https://www.amazon.com/Introduction-Probability-2nd-Dimitri-...

for linear algebra: https://www.amazon.com/Coding-Matrix-Algebra-Applications-Co...

this was my college calculus textbook: https://www.amazon.com/Calculus-7th-James-Stewart/dp/0538497.... I can't comment if it was good or not because by college, I had taken calculus twice so it was all a refresher

best of luck! You sound educated enough (yes, I'm judging from the couple sentences you wrote) that I think you won't have any problems acquiring math knowledge with persistence.


Check out my math and physics textbook, it sounds like it would be perfect for you: https://minireference.com/

I also have one on linear algebra: https://gum.co/noBSLA

Piece of advice: don't skip the exercises. It's great to learn and understand math, but you don't really learn the material until you have to solve problems and make use of the math. Since you're a developer, you'll probably also enjoy this short tutorial on basic math using SymPy: https://minireference.com/static/tutorials/sympy_tutorial.pd...


It's totally possible. Which isn't to say that it's easy. If you go down that road, be prepared to sacrifice your social life for several years.

The most frustrating thing will be going through basics that don't seem connected to your eventual goal of ML; this is potentially a long phase.

I'd recommend Khan Academy for the math basics. Schaum's Outlines series for lots of worked problems. I'd recommend Strang's MIT OCW videos and books for Linear Algebra. I first learned calculus from an economics book, probably best not repeated ;). Bishop's PRML book is a good source on probability, Bayesian stats and machine learning.

I did a part time MSc in applied stats while working full time. Even in the MOOC era, there's nothing like real exams to focus your efforts.


Khan Academy is a great place to start. I second that recommendation.

Although I was pretty confident in my maths up through basic calculus, I began fresh earlier this year starting from pre-algebra on their videos and problem sets.

I took the approach of looking at the mind as a muscle and if I had taken a long break from lifting, I wouldn't jump right back in lifting the same weights. The analogy isn't perfect, but I feel like it helped to reinforce the old basic neural pathways in order to prime my mind for more difficult topics.

EDIT: Also the achievement points aspect helped as a learning tool as well.


Yes, you can easily catch up on the math if you put in the time and it's so much easier now with the resources out there.

You say didn't take trig? Check this out as one example: https://www.khanacademy.org/math/trigonometry/trigonometry-r...


edX has quite a few good and very good courses on the subject. For example, right now "MITx: 6.008.1x Computational Probability and Inference" is being offered (other courses are available in archived mode), as well as Caltech's "Learning from Data" (theory heavy) course. There also is a lot on Coursera, but the more theoretical university courses on probability theory are on edX. Everything is free - if you ignore the certificates (if you really need to prove you took a course take a screenshot of the progress page).


My maths level was probably worse than yours: I too only paid any attention to geometry at school, plus trigonometry and the single chapter we did on logic. I think I hid under the desk when we were learning calculus.

I graduated from CompSci in 2011 and worked as a proud code monkey since then.

Last year I took up a Master's course in AI (well, "Intelligent Systems") which was chock-full of machine learning modules, as expected [1]. I finished just a few weeks ago.

I struggled a lot, particularly because I did the course part-time and I was only offered an (optional) maths course in the second year after the machine learning and image processing modules- and concurrently with an NLP module.

The maths module helped a great deal and it cleared up a hell of a lot to do with differentiation and linear algebra. I still struggled with the maths module itself and to be honest I got a lot of help on the homework from my roommate who is a maths wiz, but in the end I managed to feel comfortable with the material and to get a good mark (75%) in the maths module.

I think I impressed my maths lecturer even -but that was my coding skillz, haha. Teach was amazed I managed to implement an LSTM RNN in Python [2] (we went over optimisation and had to code a hill-climb/gradient ascent algo, plus a RNN). In truth once you get the maths down pat the coding is nothing, but I guess mathematicians are crap as coders :P

Anyway, what I mean to say is: yes, totally. You can totally get at the very least comfortable with the material required for deep (or standard machine) learning. And if you've built a good level of coding skill from work you'll find it gives you a big boost, and can even help you clarify some of the maths. Frex, I feel I got a good intuition about optimisation in general from implementing hill climb/ gradient ascent, especially because I spent hours watching it trying to get over a ridge in a datascape and getting stuck every. single. time. the dumb thing XD

I think you'll find linear algebra in particular the easiest to learn because it's almost just array manipulation and you've done that in spades. Differentiation is a bit harder, but at some point it clicks ("it's like stepping on a break pedal" or some such analogy) and it goes swimmingly from there. Probabilities are not hard either, it's just a form of logic. Other stuff- ymmv, but it's all doable.

As to books- I don't have any recommendations. I think any popular textbook is going to be good enough to get you started :)

[1] How I ended up doing that course- I got into AI via logic programming, that I learned during my degree. Many Prolog textbooks are also AI and particularly NLP textbooks so I thought it would be a shame to let all that kewl stuff I learned by reading them go to waste. At the time I had no idea how hot machine learning is in the industry.

[2] Plug: https://github.com/stassa/lstm_rnn


"A job in deep learning".

It is highly unlikely that you will get a job in which you exclusively use deep learning alone, and not any other ML/AI technique.

Once you learn DL, then, "congratulations... here are 100 other topics you might need to know about before getting a job". http://scikit-learn.org/stable/tutorial/machine_learning_map...


I disagree. I think we might soon be at a point where someone might be able to get a job just knowing how to use CNNs well. Why do I think this? Well .. CNNs have basically licked the problem of image classification. They require a lot of trial and error. So .. I can totally see people packaging this up (e.g. NVidia's DIGITS or TStreamer), and CNN skills become sort of like Word/MS Office for some industrial applications. These people won't get paid 500K .. more like what web developers make. Just my personal opinion.


I think you're right that the barrier is a lot lower now that there is a somewhat successful blackbox abstraction.

In the past you need to know that to recognize lines: hough transform, recognize polygons: line simplification, recognize face: cascades, etc etc.

Now? You can almost just feed it arbitrary labeled training data and do well without any sort of feature engineering. Just another api to glue.


Of course there will be more packaged applications, but I don't see them being something as low level as a CNN. More like applications of them such as speech recognition, translation, motion control, etc. Things that you can actually treat as a black box and integrate them into a system.

Would someone in their right mind create a self driving car product with the help of someone who learned deep learning from a blog or youtube? probably not.


You're definitely right that most ML jobs aren't DL all day everyday. But if you are working with data that is rich and fairly homogeneous but hard to model (like recorded speech) then you'll probably benefit a ton if you use DL due to the ability to learn underlying representations. If that's what you are up to then you do spend all of your time building and tuning DNNs (all three ML people at Deepgram are doing that at this very second :) )


i just want to be a software engineer without having to continually burn away evenings and weekends studying the latest shiny, continually for the next two decades, just to keep my career afloat. is that even an option anymore?


Stick to understand fundamentals and the cruft on the top layer is easy to grok. Getting good jobs is about who you know most of the time, so just keep a network happening.

Don't read too much hacker news, it kind of becomes stressful and I would try enjoy your weekends, don't worry too much about the market, a lot of good people just burn out and have breakdowns by trying to understand all that's going on and become useless anyway. Just know the basics well and learn what you need to in work hours, make time.

What I'm finding is that in the end, most of the good / important stuff ends up condensed into a nice O'Reilly (or similar) volume that you can read at you leisure' later on when the hype has evaporated. If you invest yourself too much in the latest tech constantly, you run the risk of it being redundant / replaced anyway.


> What I'm finding is that in the end, most of the good / important stuff ends up condensed into a nice O'Reilly (or similar) volume that you can read at you leisure' later on when the hype has evaporated.

If you are okay with median compensation and median project importance (internally and externally), then sure, wait a couple years when the interest has died down.


That seems like a somewhat entitled mindset. The job market is what it is and if you don't work hard to stay competitive then you'll fall behind, that's just the reality of a high paying desk job in today's economy. We'd all love to do rewarding work for great pay with a healthy work/life balance, but it's just a fact of life that this type of opportunity is not in abundant supply.


Yeah, this


> I built a twitter analysis DNN from scratch using Theano and can predict the number of retweets a tweet will get with good accuracy

I imagine a product like this could actually charge a fair bit of money helping companies and people improve the 'virality' of their tweets.


This is just a nit, but Andrej Karpathy was never a professor at Stanford; he received his PhD from Stanford and now works at OpenAI.


good catch. fixed it


This should be titled "How to Learn Deep Learning".

"How to get a job in deep learning" would include:

- What specific topics will be asked during interviews

- What the interview question format is like

- How to prepare for the interviews

- How to get interviews without a PhD. What do you need to show competence in your self learned skills?


These are definitely great points. Most companies looking for DL/ML talent aren't interested in setting up HR hoops for the applicant to jump through.

They want to see if you did cool stuff before you applied for the job. If you didn't then you won't get an interview, but if you did then you have a chance no matter what your background is. Of course, the question of "what is cool stuff?" comes up. If it is building small projects with a a little bit of success, that probably won't do it (it might work for larger companies, or companies that need light ML/DL performed). But if it is "built twitter analysis DNN from scratch using Theano and that can predict the number of retweets a tweet will get: here's te accuracy, here's a link to a my write up on it and here's a link to github for the code.".

Edit: added words similar to this at the end of the blog post.


> So even if you're a beginner with deep learning, you're welcome to apply for one of our open positions

Statements like this contradict what you are saying here - to really build a model that predicts number of retweets based on the content of the message (not something like the average number of retweets this user has) is very non-trivial. If your threshold of a side project is publishable [1], it is an unrealistic expectation.

[1] http://homepages.inf.ed.ac.uk/miles/papers/icwsm11.pdf


Speaking for the other side, the point is not that you achieved the accuracy but it is understand how you thought about the problem. The ability to decide on a useful hypothesis, formulate the problem around it, and have some way to measure your progress is very very valuable from the employer's perspective.


Sure, my point was more that this can be demonstrated by a much simpler project. How many candidates have you seen who haven't had a deep learning job before complete such a complicated project?


If you:

- have good problem solving and coding skills - spend a month or two learning how to build networks using good libraries

Then you will be able to get a good result with the Twitter task I pointed out. It takes being able to work input data correctly, think about what matters and what doesn't, then synthesize using readymade DL tools, usually in python. None of that is complicated esoteric neural net stuff, it is just motivated problem solving.

I'll add that as a big point — you should be able to code, problem solve, and be motivated.


The important part to mention is that the expected completion time of your project is 1-2 months. I have extensive ML experience (no deep learning though), and I think this project would take me at least 2 months to do well. This is a long time, especially for a beginner side-project. It is good to tell people to do side-projects, but suggesting projects like this is de-motivating for beginners. If someone on your team (who is presumably an expert already) does this side-project I'd be very interested in the results, and on how long it took them - it would make a good blog post. Your proposed beginner side-project should be less complicated than a published paper with 200 citations (the paper didn't use a nn).


Everyone thinks they can download tensorflow, hook it up to some stock market or Twitter api, hack a system in a week, and become a ML engineer. It is simply not this simple. It takes many months or years to achieve this experience.

Sometimes software engineers need to accept they aren't the smartest ones anymore. This is why they can't get the "smart" and "cool cutting edge" ML/DL jobs.


You may not need a PhD or tons of experience to learn Deep Learning, but what about the gap between that and getting a job?


If you are a great software engineer that can solve hard problems yourself, and then you add to that real experience solving your own nontrivial problems with DL, then people will hire with no questions asked. The reason is that not many people have experience in DL. So companies need to get good engineers that have the problem solving mindset and motivation+creativity to learn new frameworks and lick the problems they face in new awesome ways. Industry is dying for people like that.

If you are that type of person then you can kill it with just a few weeks of hard study on your own.


> If you are a great software engineer that can solve hard problems yourself [...] then people will hire with no questions asked.

False. Being a great software engineer is not enough to get a job in deep learning. For the same price, or a little more, you can hire an "expert" scientist or engineer with a PhD in CS/stats/ML.


Good point. Should probably say "you'll get an interview". I think you snipped out something pretty important though. The fact that you have done something nontrivial with DL is actually a big tell for the employer. If you just 'want to learn more about DL because it seems interesting' then employers aren't interested. But if you say "I am a good engineer and I have already done something nontrivial" then you have a much better chance of getting an interview and then the job.


This probably assumes you're willing to get a job in SV, NYC, Chicago, or Seattle. I doubt I could spend months learning deep learning and have any decent chance of a job in San Antonio (Texas). I don't live there - just an example.


This is applied deep learning. There's a ton of jobs available for taking someone's library on GitHub and applying it to a bunch of data. But other than DeepMind, FAIR, Google Brain, Open AI, Vicarious, and Microsoft Research, who is hiring for theoretical machine learning? That's what I'm interested in — developing better algorithms that eventually approach AGI.


Allen Institute for AI and many other smaller companies depending on how left field you want to get.


Why is the Machine Learning subreddit so toxic?


There's so much hype. It's attracting masters of the universe type people that would otherwise be you'know, at Goldman Sachs or something


just wrote a blog post that I think a lot of folks will like if they are looking for a job in ML/DL.

Would love to hear if I missed something!


One minor point is that you don't mention some fundamentals of the machine learning process, like making sure to evaluate your models on a different data set than you used to train your models.

Another point is that this article is really about how to learn deep learning, not how to get a job. I would really like to see some evidence that: "The good news is that basically everyone is hiring people that understand deep learning." Most data scientist jobs I have seen don't require or use deep learning.


It's true I didn't mention this but I was hoping the coverage in the related links would be sufficient. I would loved to have had a section that points out some basic tenets though.


I've begun looking for resources to learn deep learning / machine learning with the hopes of getting a job in the field within the next few years, what other fields of math would you recommend brushing up on before beginning to use some of the online sources specified in your post?


I can't really add much more in the way of math. Linear algebra and calculus get you really really far.


1st sentence should be: 1. Be a fuckin math whiz.

Had it been, i would have clicked the back button.


Take a look at the primary algorithms used commonly in AI today - nothing exceeds high school level math.

I posted about this earlier today but ML really should be demystified. You can write a lot of commonly used algorithms in 100 lines or fewer. The math is not complex. If you can get past the notation and buzzwords like "deep learning" (it's an artificial neural network, itself a grandiose term) you'll see it's not as daunting as most think.

The reality is most "data scientists" will be working on implementation rather than creation. They'll be working on data sets and error analysis, not creating the next buzzword-laden algorithm.


Most ML algorithms can be treated as an optimization problem.

Convex optimization is not high school level math.


If you've got a lock on calculus, programming, and linear algebra, then you've got the skills to understand deep learning. Most time spent working on DL is not in the megamath part, it is in finding good network structures to optimize results.


I think a lot of resources out there at the moment are full of ML jargon and math. And a lot of new stuff that is coming out attacks the absolute beginner. This kind of sucks .. I started using torch, familiar with old fashioned NN and just wanted to quickly get up to speed on convolutional networks. It has been a PAIN (resources I find are either too deep or not deep enough). In any cases .. thanks for putting your links .. they were most helpful.


You are right that most resources aren't that great at showing you the way. There is a lot of grinding to get past obstacles.

We are trying to address this soon!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: