I often tell students who ask me what skills they should focus on acquiring to look up job postings in their fields and read the list of required and preferred qualifications. Of course some of those lists amount to wishful thinking, especially for Big Data which is a relatively recent field (and so it can be rare to find an applicant with the years of experience that a job posting mentions), but they provide good insights into what companies are looking for. A particularly easy way to look up relevant job postings is to browse through the amazon.jobs website.
So below are some skills that seem particularly relevant today. They're obviously not all from the same job posting and, also obviously, not all job postings mention all of those, so students really should browse through postings appropriate to their interests to get a good idea of the skills they should gain. Since the course Analytics for Decision Support that I am teaching has no prereq, students typically don't have the background to be introduced to the topics below besides basic techniques (many have never used R before and/or never written an optimization model), but I'll try to discuss some of those in this blog throughout the semester, and perhaps some of my readers who already have preliminary knowledge of analytics will find those posts helpful. So I won't define the terms below today but I am just using the post to make a preliminary list of some topics I could cover. Browsing through more job postings, whether at Amazon.Jobs or elsewhere, may uncover more ideas but the list below is a good start.
- experience of Python and/or R
- knowledge of SparkML
- experience using libraries such as scikit-learn, caret, mlr, mllib
- experience using terabyte size datasets
- familiarity using data visualization tools
- experience using SQL
- knowledge of AWS AI/ML services, platforms and frameworks and AWS technologies such as Redshift, S3, EC2, Data Pipeline and EMR
- skills in Java or C/C++
- working knowledge of smooth and non-smooth optimization methods and associated technology (CPLEX, Gurobi, XPRESS) as well as high-level modeling techniques (AMPL, R, Matlab)
- experience coding in traditional programming languages such as C++, Java, Clojure or Python
- familiarity with writing scripts (Perl, Ruby, Groovy)
- experience with large distributed datasets (Spark, Hadoop, MapReduce)
- also mentioned sometimes are: Weka, SAS, Hive, Pig and time series forecasting
Comments