3 Lessons from 3 Years of Data Science

Summary posted by: Reshama Shaikh

Intro

When starting your first data science job, you are equipped with a variety of technical skills that are necessary for the job. Coursework typically focuses on mathematical theory and how to write code. However, some of the most important skills being a data scientist do not only depend on this knowledge. In this talk, Megan Robertson shares three big lessons learned from working in data science and how you can use these lessons to improve your career.

Video

About the Speaker

Megan is a data scientist based in New York City who has multiple years experience working in industry. She has experience in machine learning, statistical modeling, Bayesian analysis and more. She currently works as a data scientist at a large company and is writing a book on communication in data science.

Video Outline

NOTE: the timestamps are marked in the video description.

  • 00:00:00 Reshama introduces Data Umbrella
  • 00:04:00 Megan begins talk
  • 00:04:29 Megan introduces self
  • 00:06:03 Project management / Communication / Self-promotion
  • 00:06:48 Project management
  • 00:08:06 The most accurate model is not always the best model
  • 00:11:18 Start simple and evaluate as you go
  • 00:14:28 Learn how to manage computing costs
  • 00:18:48 Communication
  • 00:19:58 It’s important to be able to explain your job to non-technical peers
  • 00:23:04 Break down topics into smaller chunks
  • 00:25:55 Use a reliable and simple example
  • 00:28:59 Self-promotion
  • 00:30:46 You are your own best advocate
  • 00:34:54 Network inside and outside the company
  • 00:39:38 Write a good message and email
  • 00:40:33 Actively think about career development
  • 00:45:32 Summary
  • 00:46:53 Q&A
  • 00:56:02 How did you land your first job after grad school?
  • 00:58:42 How do you choose a domain?
  • 01:01:00 How many how much skills and knowledge from one domain are transferable to another domain?
  • 01:04:00 Have you experienced gender bias or microaggressions in your career and how have you navigated that?

Full Transcript

Hi, Everyone

Welcome to Data Umbrella’s webinar. My name is Reshama and I’m just going to go over some housekeeping. The way that the webinar will work tonight is that I do a brief introduction Megan is going to be doing her talk and what you can do is there’s a tab on the webinar platform for Q&A so feel free to post any questions there and when you know when it’s a good breaking point Megan will answer any questions that you have. This webinar is being recorded.

About me

I am a statistician/Data Scientist I am the founder of Data Umbrella and I’m also an organizer for the New York City chapter of PyLadies and you can find me on twitter at @reshamas. I’m also on Linkedin and Github with the same username.

Data Umbrella Mission

The mission of Data Umbrella is to provide an inclusive and welcoming space for under represented persons in Data Science and we are volunteer run.

NYC PyLadies (01:16)

This is co-organized with PyLadies which is a global organization that has over 125 active chapters around the world and the New York City chapter you can find out more about this by visiting our website here and also following us on twitter. Home Page: nyc.pyladies.com. Twitter: @NYCPyLadies.

Code of Conduct

We have a code of conduct we just want to make sure that everyone here is professional and respectful and that applies to and anything that is written in the chat as well our whole mission is to provide an inclusive environment and so we do take our code of conduct seriously and thank you for you know making this a welcoming and inclusive community.

Data Umbrella: Job Board (02:09)

I want to share the latest feature from our website. We now have a job board, so if you can find it at jobs that Data Umbrella.org and there are jobs posted and also at the top right there which is a little hard to read but it says weekly digest or follow and what you can do is you can sign up for emails to get weekly emails on what the job postings are for you and know for anyone who’s looking for a position.

Highlighted job (02:41)

I want to highlight tonight’s job which is Farfetch. Farfetch is in New York city and they have a lead Data Scientist position so visit our website I’ll also post it actually I can post it on the chat as well and you can click on the link to find out more about the position.

Website (02:56)

Our website dataumbrella.org has a lot of resources on open source accessibility responsibility and resources so feel free to check that out as well.

Data Umbrella (03:10)

These are sort of the best places to find out so if you join our meetup group you know we do share events on a lot of different social media platforms but the best place to find out about our upcoming events is on meetup and this is the link to it our website has the resources we are on twitter feel free to tweet to us we are on linkedin as well this webinar is being recorded and we are going to upload the recording to youtube so if you want to subscribe to our channel it will give you notifications of when events have been uploaded and we’re also on facebook okay.

Let’s get started (03:45)

So, I’m going to hand this over to Megan Robertson megan went to school up at Amherst in Massachusetts and . She’s in New York now and I’ll let Megan introduce herself.

Welcome Megan (4:01)

3 Lessons From 3 Years of Data Science: Megan Robertson- Data Scientist (4:12)

Cool ! Thanks really excited to see everybody in the chat from all over the place really appreciate you guys taking the time to come and listen to what I have to say ! so today I’m going to be talking about three lessons from three years of Data Science. So, going over what are some of the big things I’ve learned so far in my career so a little bit about me.

An Introduction (4:29)

So my day today I’m currently a Data Scientist working in industry and I’ve been working with companies and consulting of various sizes for about three years I want to give a quick shout out to the legal company of my current employer and just clarify that what I’m presenting today are my own views do not represent the views of my employer at all and another thing that I’m currently working on is I’m working on a book on Data Sciencecommunication called “Mastering Communication and Data Science which will be hopefully out late of 2021 if all the deadlines are met and then in terms of my background so I do have a Masters of Statistical Science and when I was in grad school I focused particularly on the areas of Bayesian statistics machine learning as well as Python and R programming but say definitely more of my day-to-day is spent in python and that is my language of choice but I’ve also dabbled in R and I also have a background in sports analytics I interned for an MBA team when I was in school and actually my love of sports and being able to combine that with statistics through sports analytics is what really got me interested in learning more about statistics and eventually becoming a Data Scientist like I am today.

Project Management/Communication/Self-Promotion (6:03)

So a quick overview of what this presentation is going to look like so these three areas Project Management, Communication and Self-promotion these are the three big areas that the lessons I’m going to talk about today fall into and these are definitely three things that have been pretty impactful to me and have helped my career up to this point and I do get a lot of requests for students just asking questions about Data Scienceand they often ask you know what do I wish I had known back when I was starting my career or when I was applying to jobs and these three things here are usually make up the bulk of the answers that I’ll give to that so I’m going to cover what each of these three lessons are as well as some ways that they show up now in my day-to-day or how I go about using these lessons.

1. Project Management (6:58)

So, first in the area of project management one big thing that was especially really different from school that I learned is that the most accurate model is not necessarily always the best model so with project management I’ve definitely also learned a lot about working with people who aren’t data scientists. I could probably say a lot of lessons in this area but I think this one summarizes a lot of what i’ve learned the most. You know in school you seem to always be trying to fit that best model in terms of some sort of predictive accuracy whether it was mean squared error area under the curve or whatever metric you were looking at and I can distinctly remember being in like the bowels of the basement the night before that project was due adding variables looking for additional data trying to improve that performance and this is not necessarily now how I approach these projects.

The Most Accurate Model is Not Always the Best Model (08:06)

In terms of why it’s important to know that the most accurate model is not always the best model is there are lots of different things that you need to consider when you’re developing a model for a project. It’s very rare for you to be working in a Data Sciencevacuum like maybe when you’re doing school projects. It was kind of just yourself and your group working on something but now in my day to day it’s very typical for me to be interacting with a marketing team or a product team or a strategy or leadership and first of all you need to be able to operate within constraints of deadlines. So sometimes I’ll get requests come in that have like they need an answer by the next day or within the week I do also get some more long-term asks but you’ll need to also be cognizant of how how much time that you have to turn this around and obviously fitting a really complicated model takes a lot more time than maybe doing something a little bit more basic or that requires less computing time or something.

It’s also important to note that complex as you build more complex models this can lead to increased costs and something that I didn’t have to deal with as much before I started working in a company was the fact that my team now has a budget so it could be great that I want to fit you know maybe some really complex artificial intelligence neural network type model but at the end of the day you might only see a marginal increase in that accuracy so you have to think about not only the time that it takes to build that more complicated model even if you do see maybe an increase in that accuracy but as they say like to be cliche time is money so if I’m spending a lot of my time that’s only going to yield a really marginal return in that increased accuracy that takes away from my other priorities that I’m working on and more complex models can also just be more expensive from a computing resource stand standpoint as well .

And then sort of the last reason behind this lesson is that when you’re doing black box models or ones that are more complicated those can also be really difficult to explain so this is something that is special to note especially if you’re working with stakeholders who might be less familiar with Data Scienceis that at the end of the day you do need to provide some explanation of how you got to the numbers that you’re delivering or the recommendations that you’re making and it’s a lot more complicated to explain some like big AI model versus something that might be a little bit more simpler but does the job almost as well and you definitely want to make sure that you can explain these things so that you can build trust within those teams but that I’ll touch on a little bit more later when we get to the communication section.

Start simple and evaluate as you go (11:18)

So, How I go about like really taking this lesson in my day-to-day so first of all is whenever I start a project I tend to approach it in the same way and that is by starting simple and just evaluating as I go. So the first thing that’s really important to do is just figure out what has been done before and this can be something that’s either internally or externally within your company so I’ll usually try to see either within my own team or if there’s some other Data Scienceteams that work closely with us like have they built a model or some sort of report that I might be able to capitalize on and see what I can reuse and then externally we’ll usually do some searches for either white papers or some other company Data Science blogs where I can see what they’ve applied in a similar situation see if I can borrow anything from that or adapt it for my own situation.

It’s the next thing I’ll do is then figure out how to establish what I call a baseline level so say this is a modeling project in that case I would start out by creating a really simple model and look at however I want to evaluate that so maybe it’s the mean squared error and this is really important because then you have a score to beat that as you try to maybe add more variables to this model or just look at other analysis options you have something that you can compare it with. So in if you were to just jump right in and try to make something really complex you throw a lot of time into that and at the end of the day it could be the simplest thing that took not as much time was actually your best solution so it’s really important that you just get a sense of okay what would be that simplest answer and what sort of evaluation can I look at to see if I beat it.

And then as I’m going throughout this project I’m always looking at that difference in performance so for me personally for my projects I’m typically working with hundreds of millions if not billions of observation points ,so I’ll typically fit a model on like a smaller data set look at the performance and see if that will scale so what I mean by look at that performance difference is one a more complex model that’s going to require more time and more money to fit does it perform better and if the answer is yes does that difference in performance justify the amount of time and money that it would take to scale it. and this is definitely a trade-off where you need to think through the pros and cons and it’s something that I think you’ll get a better sense of to when you’re working on projects and collaborating with other teams to figure out okay in what case is it okay that we’re going to lose you know like a 0.25 difference versus something bigger.

Learn How to Manage Computing Costs (14:28)

So, the other big thing that i’ve learned that sort of falls into this area but I wanted to definitely have a call out for it because this is an area that I personally have learned so much out of side of school in is learning how to manage computing costs. So, when I introduced this lesson, I said that oftentimes like teams are operating under budgets now and as you make things more and more complicated it requires more time and more computing power to to fit and finish that analysis.

But here are some things that i’ve learned that are helpful in kind of learning how to manage those costs so first of all is just learning differences between cluster settings so when I say clusters I’m referring to basically like these super computers that I can access from my personal laptop by logging into them and writing code and sending it to them so like google cloud computing resources if you’ve worked with that you’ll use clusters in that also aws or amazon web services they also have these like super computers that you can use.

And I’ll admit too this is something that I’m still definitely learning from like in my day to day but just really understanding the differences between the certain types of processing units that can you can have in a cluster as well as what does it actually mean to give it more memory and what sort of things are stored in memory.

There’s a lot of stuff out there a lot of times the companies that you’re using for these resources will post have a library of docents or information about it that can be helpful.

I also know some of the ones i’ve worked with they provide like an office hours or have people you can reach out to to ask questions i’d say that’s definitely a great source but again like this is something I’m definitely still learning as well. So, I recognize that it seems very easy when I say learn the differences but in reality, it’s not always that simple. And then the other thing in terms of managing computing costs is thinking about ways that you can reduce the size of the data. So, say for example that we fit on a hundred percent of our training set but if we fit on 75 percent of our training set it would take less time and there’s only a small reduction in accuracy so just thinking through like all these different ways of can I actually reduce the size of the data that I’m running through this model.

And still get comparable results can also think about do I actually need to be including all of these features in the model that I’m training maybe there’s some that are correlated enough that you can drop them or think about just getting down that size especially if you’re getting to a more complicated model. And then I’ll say the last thing that i’ve definitely gotten better with is just using being able to take advantage of the efficiency of parallel computing.

So, when I say parallel computing there are say I want to fit a bunch of smaller models and combine all of their powers to make a really complicated like bigger model I could use parallel computing in that sense that I could use a bunch of these different super computers send smaller models out to each of them and they’ll all run at the same time so that can reduce your time and your computational power as well

So again, these are just a few areas that maybe at the end of the day you find out that simple model like does okay but you really want to get to that complicated model these are some ways that you can go about doing that. So, before I get to the next lesson, I’m just going to check the chat real, quick to see if we have any questions come up you also there’ll be time at the end as well but okay! I don’t see any right now so we’ll move to communication.

2. Communication (18:48)

So, in the area of communication the big lesson here is that it’s really important to be able to explain your job to non-technical peers. So, when you’re in school you’re typically you know speaking the same language as your peers and your professors you can get up there and give a project presentation and just drop all of the jargon out there about parameterizations and loss functions and aI big data all of that stuff. But when you’re working as a Data Scientist especially if you’re in like an industry role ,You’re going to be working with a bunch of teams who aren’t data scientists and who maybe have taken an intro stats course so they don’t necessarily need to know the in-depth details you’re not going to get up in a meeting and start writing greek symbols on the board but you do need to be able to explain some of these more complex things that you’re doing to an audience that doesn’t have that same background as you.

It’s important to be able to explain your job to Non-Technical peers (19:58)

So, why this is important I touched on this first point in the previous section but it’s really important that you build trust and develop a relationship with a lot of these teams that you’re working with this is especially important if you’re collaborating with a team that might not be as familiar with Data Science.

And that’s because they need to understand more about like how you come about your numbers and such because you wouldn’t expect. Oh! Okay I’m going to give you the output of this model they’re not just going to go apply your recommendations blindly they want to understand where they come from and need to have a basic understanding of that.

Being able to explain your job is also really important in terms of being able to educate teams on the Data Scienceprocess so this really has to do with the first point as well because you want to have a collaborations with teams where they understand you understand their expectations they understand your expectations especially when it comes to timelines and resources i’ve definitely been in situations where people think oh Data Science I’m just going to take you know two and a half excel spreadsheets email it to you you can put it in your magic black box or crystal ball and all of a sudden all of this great information is going to come out because we’re using the data that’s often not how things really happen so being able to explain like how complicated things work can really help you just educate the people that you’re working with on that process and make sure and really ensure that you have successful collaborations moving forward.

And then the last point I’ll say about why this is important is this can really help you develop a Data Science brand if you’re very good at going out and talking about what you do especially to a diverse group of audiences.

And this is something that will be important within your own company in your own role but also outside because my previous two points you don’t want Data Science to be seen as the team that just sits off in the corner and writes all of their code and does math and then comes out with some numbers you want you do you want people to recognize you as an expert in terms of how they can get the most information out of their data how it can help them inform their strategy but you are also relying on the expertise of the other teams that you’re working with as well and you want them to help inform you throughout that partnership.

And then in terms of outside of your company if you’re interested in like going to conferences and talking about your work or writing blog posts or having a presence like that that can help your career it’s really important to have these communication skills as well and you can maybe even get some new opportunities out of them.

Break down topics into smaller chunks (23:04)

So, what are some of the ways that I go about using this communication lesson and applying it so this is really a topic that I could talk about forever but I’ll give like a very kind of high level of how I think about some of this stuff and definitely if you have questions about it like throw them in the chat or you can also reach out to me after if you have something else you want to talk about more but I always think about when I’m presenting something that’s more complicated first thinking about how can I break this topic down into smaller chunks.

So, suppose we have a situation where I have fit a decision tree model to help a marketing team predict if someone’s going to click on an ad so I would think about the different parts of a decision tree individually and then think about how I can present them all together. So, the great thing about a decision tree is it’s basically just a flow chart that we’re figuring out the structure of using math and a lot of times if you’re working with people of more of a business background like they’ve definitely seen a flow chart.

So, in this case it’s really great to have like that example that they’ve probably seen like this image before. And to that second point it’s a decision tree at its essence is grouping together the most similar observations so we know that at the top of this flowchart we have all of in this case our customers of interest and we’re putting them into groups based on who’s most similar at that point.

And then lastly, it’s always important to just think through what are some of the assumptions and limitations of a particular model that or analysis that you’re presenting. So, like for example in the case of the just with any model you can always talk about the risk of over fitting. So, if I have some data where the age range of the customers are maybe only between 18 and 35 that model is not really going to apply well to someone outside of that age range.

For example, and it’s just important to be responsible about sharing any assumptions and limitations just because at the end of the day if they you know take this analysis or take these numbers from your presentation and apply it else where they just need to understand. You don’t want you don’t want people to basically take your analysis and kind of run off with it and start applying it to a bunch of different areas that they shouldn’t and you just want to make sure that you’re getting that stuff across as well.

Use a Reliable and Simple example (25:55)

So, the other way I think about being able to communicate more effectively with the other teams that I work with is one to like walk through the model with the relatable and simple example so suppose this here shows like a very simple version of that decision tree for answering the question of okay will this customer click on the ad so I personally love putting visuals on size especially for this because it gives you something to point to you can like really just lay things out clearly for people to see.

So, say for example our simple and relatable example we’re just going to pick one person and see how does this model work in predicting are they going to click on this ad or not so up here in this top we have all of our people are all in one group these are all of our consumers who have been shown some sort of ad so say we want to predict for me am I going to click on that ad so we would go down to this age level and here you can remind them how those buckets are split up it’s how we’ve been able to quantitatively say these people are very similar to each other.

I’d say it’s pretty rare when you’re working with like a non-Data Scienceteam that they would ask for the specifics in terms of like what function did you use to define that so I wouldn’t worry so much about those details but just know okay the people who end up in this bucket together they’re all the most similar to each other at that level So, for me I would go down to that 26 plus group and then say okay either customers are shown an ad on instagram or one on facebook for me I was shown an ad on instagram so it would follow that flow chart down and we’d see within all the 26 plus year olds shown an ad on instagram only 34 of them clicked so we’re going to predict that I don’t click because the majority of people did not click in that instance.

And if you’re presenting to like say this was the marketing team or something else like definitely even if your actual analysis is a little bit more complicated really try to get it to a more simple example where you can just walk through each of these so I just walked through for example what this like a simple version of this decision tree looked like but maybe in reality I had more variables or more something but this at least gets them to the fundamental concepts of we’re really just grouping together people who are super similar and then we get to a point we’re just going to predict what the majority of those people said.

SELF-PROMOTION

4. Self-Promotion (28:59)

All right I’m not not seeing a question there so then we can move on to the last. The last area and lesson in regards of self-promotion so with this and this is definitely I think one of the things i’ve learned that has had the most impact on my career it’s also one of the most challenging things is to know that you are your own best advocate So, for me I think I definitely got used to being in school and if you study hard really learn the material put together a good presentation that’s reflected in your grade so at the end of the day good work typically seems to be noticed But this isn’t necessarily always the case and/ This is something that can be especially challenging especially at a larger company and it’s not necessarily something that’s nefarious at all

But when you’re working with a lot of other people there are a lot of different projects and priorities going on sometimes you just have to like raise your hand not literally but say hey look at me look at this really hard project I just finished and I definitely had to get used to promoting myself and talking about my own accomplishments Because I found that I was constantly worried about oh am I going to come across as I’m bragging even though I’m actually I actually wasn’t but I feel like especially in my first three years and tending to be the more junior person in the room I also would doubt myself in terms of oh well sure I did finish this project but you know I’m sure someone who is a senior level could have done it so much more so or so much better.

You are your own best advocate (30:46)

And in terms of why this is a really important lesson is I just want to start off by saying that like it is important to know that you are an asset to your team so even if like you’re still in school studying this working group projects or if you’re someone who’s working as a Data Scientist now like recognize that your skills are valuable you make an impact and that you would not get to the point where you are today if someone else didn’t recognize your potential.

And I know imposter syndrome gets thrown around a lot especially in our field and it’s going to make you always have that little voice in the back of your head that’s like oh no like I just got lucky they didn’t actually you know oh they didn’t actually read my application that closely or someone else you know should be here more than me. And I know that’s not always hard to ignore but just definitely like take some time and recognize what are some things that you’re good at to try to combat some of some of that stuff.

And another thing that I think is why this being your own advocate is important falls under in relation to mentors and advocates those are also very important I’ll talk a little bit more about like how to go about finding those in the next couple slides. But just some definitions to provide here so a mentor is somebody who I think of as like providing assistance and helping you out with career navigation so i’ve had mentors who are data scientists i’ve also had mentors outside of the space who I typically go to more for just like professional questions.

But you would go to them to ask like oh hey I just I’m facing this situation at work how do you think I should proceed or they can really be helpful in guiding the next steps of your career and helping you figure out what you want to do on the other hand an advocate is somebody who is helping to actually create opportunities for you and will go to bat for you.

So like this can typically be someone like a more senior member on your team or even your manager and that they know what you want to do and if something comes up that you’re interested in or that would benefit for you like they’ll go to whoever they need to and say like yes I hope like I support this person I think they’d be a great fit for it and really help you in that sense.

So again they could be potentially the same person but also there’s a pretty noticeable difference between the two.

And then the other reason why it’s important to be your own best advocate is how this can help you with career development so when you’re advocating for yourself you’ll definitely be more likely to have a direction that really interests you or work on projects that are more exciting to youI think for me I do recognize that like saying career development especially after I just got in my first job seemed like a very overwhelming thing because I applied to so many jobs in the beginning I finally got one can take a breath.

But now it seems like oh well now I have to figure out what I want to do next and what do I want to work on and all of that I’m going to touch on career development also in the next couple slides so don’t feel too overwhelmed by that because that’s something definitely I was. But it doesn’t have to be as like you don’t have to have a concrete five ten-year plan there are other ways that you can think about it. And advocate for yourself so that you end up just learning things that are interesting to you and really getting you closer to your own personal goals.

Network Inside and Outside the Company (34:54)

Alright so how to then go about advocating for yourself one is the dreaded networking world word so it’s really important to network inside and outside of your company one way to do that first of all is to contribute to any regular summary content so different companies will do this differently but typically either once a week or once a month they’ll either have a meetup or a stand up or some sort of email or report that goes out that highlights you know what are some of the big projects or what are some of the things that people are working on so if they do that wherever you are definitely like think about contributing to that talking about the work that you’re doing getting it done if one doesn’t exist but it seems like the team’s big enough or the company’s big enough that would be like a good idea maybe think about like leading that initiative to start one just again it’s especially at those larger companies you just want to make sure people recognize what you’re doing they see your name in relation to the work that you’re putting out there.

Another point to this that’s not necessarily about like contributing to regular smary content but typically a lot of companies will have like an annual review process so something that I’ll do is even just keep a like a note on my work computer of some of the big projects that I’m working on and what are either some of the big breakthroughs or the new skills that I Picked up on that just

Because you know 12 months go by and it can be hard to remember everything that you did especially this year because three months feels like 10 years. But just get used to keePIng track of all the different things you’ve done and skills you’ve learned because that can come in handy when you then go at the end of the year to have those conversations with your manager

Another thing in terms of networking how to go about it and this is something that I like absolutely hated when I started out is just set up one-on-ones or to use sort of the business lingo coffee chats with people this can be definitely the people inside a company if you already are working that either work tangentially tangent I can’t say that even a math person that work very closely with your team maybe or even in other departments.

Or if you’re a student or working to looking to break into Data Scienceit can also just be people that you find whose work that you’re interested in. So, for me I started out by like reaching out within an alumni network so whether that be through a college or if you do a boot camp or any sort of program like that oftentimes/

I’ll have a directory of people that you can reach out to. And that I found really helpful at least to start out with because I could then there was a common ground that I had just that connection.

And I would also recommend for this like definitely plan some questions and talking points ahead of time I think for me I always found this stressful because it was like what if we don’t have anything to talk about and all of that but if you come with some questions ahead of time and you get more practice with this you’ll definitely get more comfortable having those conversations with other people. And I will say too I think one of the biggest things that I why I did not like this at first as a student is I always felt like I was going out and asking people for things and not really having as much to offer in return.

But I’d say definitely like if people respond and are willing to talk with you. Like if you’re reaching out like as a student, they probably already know that it’s going to be more of like answering questions.

But also know that like you don’t have to get really far in your career to then like have people that you network in network with have a benefit from knowing you as well it’s like. For example, like one to two years into my own career after I met someone at an event that I went to like I ended up helping him find an intern to hire. So just know that you don’t have to be like super senior or have done all these projects to necessarily have like a networking relationship where you’re also helping other people as well.

Network Inside and Outside the Company (39:38)

So then start with this last one just so I’m not sure where I dropped off but so it’s really important when you’re reaching out to these people to do research before you send a message , it’s definitely pretty easy to tell if someone’s just copying and pasting those same messages to a bunch of different people and changing the name, so take a little bit of time read their blogs, look on linkedin and see what sort of information can you add in there just to help yourself stand out a little bit and show that you’re also invested because the reality is that a lot of these people are probably receiving messages from like a ton of people and you want to make sure that you can differentiate yourself from the group a little bit.

Actively Think About Career Development (40:33)

So, the other way that I apply this lesson then in my day today is actively thinking about career development. So, again this is something that when I first got my job my first data science job seemed very overwhelming of just finally I finished school I got the degree I landed a job like let’s take a deep breath and then it seemed like well no now I have to it’s very overwhelming to then think about okay what does the future look like.

But I’m here to tell you it does not have to be that overwhelming so just a couple of things i’ve learned in regards to this.

So first of all in Data Science, typically you’ll see jobs or two different career paths exist of you can either become in what’s called an individual contributor or more of a management role ,so typically and again this isn’t the same for every company but you’ll come in as a Data Scientist there might be like one or two senior levels above that and then you get to like a fork in the road.

Where one is that is path one is a manager so what it means then to be an individual contributor is you are somebody who spends more time like writing code building models working on projects you might have one or two people report to you.

But at the end of the day more of your time is still spent like coding and doing the math and statistics side of things versus someone who goes more down a manager path they’ll be somebody who is like who’s leading a larger team and helping guide a lot more of this long-term strategy and priorities of things so you’ll typically they’ll typically spend a little bit more time in meetings in terms of understanding different requests that come in for the team figure out can they take on more requests what has more priority and they’ll also they’re responsible for more people and will also typically play a large part in career development for their team as well.

So, again that’s just something that’s good to know because as you’re working on different projects and doing different things it’s just good to get a sense of to this next point what type of work makes you happy like if you find yourself hating going to all these meetings around project planning and all of that that’s probably okay if you don’t like that maybe management isn’t necessarily what you want.

But definitely just even thinking about what types of work makes you happy is really important. Because again you don’t have to make this crazy five-year plan. But in your day-to-day think about what are the things that you enjoy what are your favorite things to do like what is the stuff that you don’t quite look forward to doing what do you want to learn more about.

So, for me I found for example that I really liked being able to be like the person in between Data Science and the people that we work with in terms of like I still spend a lot of my day-to-day writing code and building models.

But I really liked being able to take all the analysis that I did and help somebody who wasn’t in Data Science do their job better or answer their questions or really help shape that like answer their questions that might be shaping the overall strategy.

And then I’ve also found I personally really enjoy the opportunities I’ve had with mentoring more junior employees and helping them start out some of their first projects.

So, for me personally it’s like okay that means at this point like I’m more interested in probably thinking about that manager path but again like career development doesn’t have to be this crazy I’m going to have all the details of my next few years planned out it can be as simple as identifying what do you like to do what don’t you like to do.

And once you figure that out it’s really important that you share those goals and don’t be quiet about what you want so this is a great thing to just talk about with your manager and some of your other colleagues.

Because if they know what you want like for example your manager is going to be aware of opportunities and upcoming projects often times before you are and if they know like okay hey like Megan really likes working with more junior employees that she can help mentor and build their skills like here’s a project where I know like she could take more of a leadership role in.

So, don’t be afraid to talk about that because you want other people just to be aware of what you’re interested in as well.

Summary (45:32)

So, then to wrap it up with just a quick smary of what the three lessons are.

So, first of all better accuracy doesn’t necessarily mean that it’s a better model at the end of the day your best solution is going to be the one that meets the project means but also really makes the best use of time and money when compared to your other priorities. Again, it’s a lot easier said than done in that sentence there but this is something that you’ll get a better sense of as you start doing more projects. It’s also really important to develop strong communication skills you need that so you can build relationships with other teams.

Build a sense of trust among the people that you’re working with and it can also be great if you do want to pursue some opportunities in terms of speaking at conferences or at meetups.

And then lastly know that you are your own best advocate so even when that imposter syndrome creeps in and you doubt yourself don’t be afraid to speak up for yourself to your home your own horn about your accomplishments and really be sure to go after what it is you want and just make sure that you’re sharing your career goals with the people that you work with.

Questions (46:53)

And that’s what I have for you guys today so here I have my twitter handle @LEGGOMYMEGG04 / MEGROBERTSON.WEEBLY.COM as well as my website so if you have any questions or anything that you want to ask about or chat about you can definitely dm me on twitter or I have like a contact field on my website as well but I’m also happy to take what questions we have time for now.

Moderator: Hi. Megan,so there is a question in the Q&A, I could read it from Sandeep it is related to the first section could you highlight some of the at-scale computing methodologies that you found to be key the best ways to learn them?

Yeah! for sure so I’d say for me personally the biggest things that I’ve learned that has been helpful has been using PySpark. In terms of the best way to like learn it right now, it’s a little different because I feel like when I was in school there wasn’t as much of a focus on like big data computing. So, I personally just learned a lot of them.

By okay here’s this problem I have at work and we got to get it done somehow. So, kind of trial by fire. But I’d say like figuring out one good thing to really learn more about like I was saying before is parallel computing. So, PySpark you can actually use that.

Like implementation of using the spark language in python. That can help you run things in parallel. And I honestly like I said a lot of I learned was just like I need to figure this out. But I would say definitely look into parallel computing, the spark language. Just because there’s a lot of stuff that fits in there in terms of figuring out. Or that explains like how it works and what the difference is between the terminology that they use. And then I would say just go and like find some large data set.

There are some machine learning data repositories out there with some good data sets of just download it. And like just start trying to write code and see okay. If I just fit a model like this versus if I maybe use like that PI spark in parallel for example like being able to see that difference. But on the top of my head I can’t really think of any particular good resources. But that’s also because I just kind of had to learn this by trial by fire.

Question on confidential information (49:41)

Next question is from Sandra and she asks how do you share your accomplishments outside of your company without revealing proprietary or confidential information? Yes! so this definitely hits a little close to home for me per my disclaimer at the beginning. So, I think how I typically go about it is well one I just start by thinking about okay what are my accomplishments like. If I could say anything for example like what would I want to go out and like either put on like a linkedin bullet point or something like that. And then from there I kind of work backwards in terms of like okay well what do I know I definitely can’t talk about.

And then once I get to like I have sort of not a watered-down version but like removing some of that proprietary or confidential stuff then typically I’ll like ask my manager or like talk to the appropriate department about that if I need to, But I’m trying to I know this sounds like a very vague answer just because for me like because I’m technically giving an external presentation this answer I’m giving is very different than what I would tell you individually. But I would say there’s definitely like think about could you talk about like maybe you worked on developing a new feature for something and you know it impacted like ex customers. Like usually something like that quantifying it that way you can usually do as long as you’re not you know talking about that underlying model or technology that you use for that.

Technologies used in job (51:40)

Okay the next question is from Kenisha which is when you first started in industry were there any technology or techniques that we that you didn’t know how to use or do that were necessary for the job and if so how were you able to learn it was it via company training or self training?

Yeah! so I think for me definitely the biggest thing was just figuring out how to work with very large data sets. I think I had maybe one lecture in grad school where the professor was like this is AWS. And I honestly didn’t remember anything from that so I think definitely like learning that cloud computing stuff was helpful. That was a combination of just having people either on my team who had background in that really help out learn that. But also, just kind of diving into a project where I needed to know it and looking at code that was either on blog post or elsewhere in applying it.

I do think that there’s a lot more out there in terms about having like just the background of how these different technologies work so i’d say that’s definitely one thing i’ve learned a lot about on the job. And then the other thing I think would just be like database querying so I personally use SQL. And part of that I feel like it was my first I I had done like the sql course on like khan academy or something which was very good but until I had that like first internship where I was actually going through and generating features from the data and joining tables like then I really felt like I mastered SQL. So I think for both of those it’s really a combination of like doing an online course or reading blog post but then also like having to implement it on my own.

Q: internship (53:44)

Okay Thank you. The next question is from Kailande which is:

Do you think it is essential to have a Data Science internship before landing one’s first full-time position specifically for people with non-CS computer science backgrounds but with data experience with R and Python through quantitative research experiences?

Yeah! So,I think that it’s definitely helpful to have an internship it’s not necessarily going to be a deal breaker all the time again that’ll depend on the company as well. But I think when we’re like when a company is looking to hire a Data Scientist, they really want somebody who has like a quantitative background so just in terms of understanding. Like probability like intro stats maybe a little bit of machine learning it’s great that you’ll know code. But I think as long as you can demonstrate that you have the different skills necessary to be able to take some sort of business question or request. Think about it figure out how you can translate that into something quantitative that you can look at with the data that you have be able to run whatever analysis you think makes sense. And then communicate that back to whoever it is you’re working with like that’s fundamentally what a Data Scientist does. And what companies are interested in.

So yes, having an internship can be a great way to get that experience. But you can also demonstrate that with other ways I think one of the biggest questions I’ll always get is I’m trying to find my first Data Science job. But I don’t have any experience how can I get experience if I don’t have one. So, you can also think about like the stuff that you’re doing either in school or even in your free time or through Kaggle. Just think about okay what where can I take a data set find that interesting story using my technical skills. And be able to communicate what’s important from it.

So again, like Data Science will help with that an internship will help with that but it’s not necessarily a deal breaker.

56:02 first job

Okay the next question is from Neely which is could you share more about how you landed your first Data Sciencejob out of grad school?

Yes! So,I will say that I first started out in a very inefficient manner of coming out of school I was like okay well I’m just going to apply to as many places as possible because at the end of the day somebody will like respond to me. And yes, that was true but I found that it was a lot of you know effort and time put into submitting these online applications. And I wasn’t necessarily seeing the return that I got that I would have wanted. And I think like part way through that process.

I from like reaching out to some of my other classmates or just people that I knew. In the area I kind of shifted my approach to them more thinking about like, building a network, as a way to go about finding jobs. So, the reality is like if a company posts a Data Science position typically, they really get inundated with a ton of applicants.

So, even if you are the most qualified qualified applicant at the end of the day like you might your resume just might get buried and like because so many people are applying.

So, I shifted more towards of like focusing on networking but networking with the sense of like I definitely want to make sure that I’m connecting with people who are working at companies I’m interested. Or areas, like specific areas that I’m interested in.

Just because again like I I do not recommend going out and just asking random people to refer you to a job.

But I found that like if you go out and like have conversations and ask them like what it’s like to work at their company and all of that stuff.

Like a lot of times it would just come up organically they’d say oh if you see a position, you’re interested in let me know.

And sometimes there wasn’t a position open at that company but down the line like something came up that I could talk to them about

So, again I did have success in terms of I got job offers from just applying on websites.

But I found that my time was much more effect effective in taking that networking approach

Well yes, I’m still job hunting but there’s also just a lot of great information that I learned along the way about working in Data Scienceas well.

Q: Deciding on a domain to focus (58:42)

Okay the next question is from Sandeep. Could you please provide some pointers on deciding on a domain even if I don’t have much experience in it, I’m sure you found it cool to find a job in the domain that interested interested you from the outset?

Yeah! Definitely, So I think one of the cool things about Data Science is it basically you can like every company has it for the most part or not company but you can be in retail you can be in sports you can be in medicine all over the place.

And I think for me like my initial job right out of school I was working for a now defunct company that like tracked engagement for online publications so wasn’t necessarily something that I was personally super interested in.

But I think for me I saw the opportunity of like I knew I wanted to be in a position where I could be working a lot with a lot of different teams and helping them do their that job better I think also a big part of it too was when I had interviewed and like meeting the people that I would eventually work with I could tell that like one that they were people that I would like personality-wise like got along with.

I could tell that the head of the team was like very invested in building out the Data Scienceteam within the company as well as helping his employees to develop.

So I would say like definitely if it’s in an industry you’re not familiar with kind of just see get a sense of what sort of projects that they’re working on because like my first job I was definitely not passionate about online art like online publication engagement metrics.

But I did find that there was a really big benefit in terms of like being able to work with the team that I was in and the projects that the team was working on like I got to learn new things obviously about that area but also things that really interested me in terms of new techniques that I hadn’t learned before.

Yeah and you know just to add to that I think one thing that i’ve learned is it is interesting like how many how much skills and knowledge from one domain are transferable to another domain. For sure!

Q: Writing effective LinkedIn bullet points (01:01:00)

Next question is from Neely which is, yes we have time for question do you have any tips on writing good or effective Data Science Linkedin bullet points.

Yeah! so I think for that I think a common thing that gets thrown around that I do agree with is it can be helpful to try to figure out a way to quantify like projects that you’re working on so if you did a project that like influenced like say some marketing campaign for 50,000 people for example.

Or you know if you implemented an a b test that ultimately like your recommendation cause like x dollars and increase in revenue or something i’d say that’s definitely put that in there because that can be very helpful for when somebody who’s maybe not as familiar with the Data Scienceis looking at it so like maybe someone who’s an hr is in charge of hiring for a position for example because like by seeing a number they’ll at least be able to get a sense of the impact of it.

I think also highlighting any instances where you’re like I feel like teamwork is always something that can sometimes get overlooked so talking about like the teams that you’ve worked with or collaborated with because at the end of the day like maybe if you’re at some super research heavy position but otherwise you’re going to be working with people who aren’t data scientists so they want to get a sense of do you have those communication skills as well.

So I think I whenever I’m updating things I’ll usually or like if I’m updating my resume for whatever reason I’ll definitely try to add a bullet point that focuses on some of those non-technical skills as well.

And the last thing I’ll say too about like Data Science Linkedin bullet points or even bullet points on your rese is like if you are gonna like definitely mention like the different coding languages that you use for a project or the models or algorithms that you used but if you’re going to mention them there definitely make sure that you can answer a question about them just i’ve been in cases where i’ve interviewed people and they say oh yes I fit this this and this model definitely make sure you can then ask some questions about like the math behind that model or how you went about doing tha.

So if you’re having bullet points from like earlier projects definitely make sure that you review those kind of so they’re fresh before you go into an interview because we definitely look at Linkedin profiles as well.

Q: (01:04:00)

And there’s one more question here which is have you experienced gender bias or microaggressions in your career and how have you navigated that?

Yeah! Yeah! So, I think being a woman in a usually male dominated field this has definitely come up I can think of many kind of different experiences that fall under this,

But like I think of one example in grad school where kind of just a group working on a problem set and I shared the correct like oh no we should try this and it turns out being right and myself and my friend were the only ones that were listened to me and then we had a a male student in the group say the same thing 30 minutes later and I was like wait literally I already said that.

But I think it’s something that is definitely getting better I think that people now especially with some of the events that have happened this year are more likely to kind of recognize that okay like people who tend to be minority in spaces like they do get treated differently at times and I don’t even acknowledge when that happens all the time.

So, I think for me one it’s been very helpful to find not support groups but like all like one of my mentors that I have like all often like talk to her about some of this stuff because she’s in a more senior position than me and just like that can even just be helpful in terms of like okay am I crazy like just having some other people to talk about who are also experiencing it is great.

And then I think two I definitely have gotten better at kind of figuring out okay when something happens okay what’s the best way to address this and like when to bring it up and I think too like i’ve definitely.

It can be uncomfortable at times as well as it’s definitely not easy but i’d say one of the biggest things that i’ve gotten more comfortable with doing than that I would find a lot in meetings at times that I would like be interrupted a lot.

Or like people weren’t really quiet, like paying as much attention to my points.

And again, like this is not necessarily easy and can be uncomfortable but like if some I got better at okay if someone tries to cut me off and I’m not done like I’ll just keep talking and then usually they’ll get the hint.

I think too also just like identifying people that you can trust within your team as well can be helpful too to kind of have an ally with that it’s.

Like for example in one of my previous roles like I had someone who was male who like in meetings and this was like I didn’t even ask at all but I was like afterwards was like wow thank you I really appreciate that where like if I got interrupted he’d be like well wait Megan wasn’t done talking like you need like let’s hear what she has to say.

So I think kind of finding just people who you trust that can maybe help you with some of that but then also recognize that like at some times you might have to have that uncomfortable moment of like being in the meeting and just keePIng talking while someone’s interrupted you or another thing that I’ll do from time to time is like if I say something and then the classic you know someone else repeats it and everyone’s like oh wow that’s great all done ask oh like can you highlight how that’s different from what I just said.

So again, it’s not easy but definitely like try to find that support system as well so that you can have like a place to vent also about it too because that can be definitely therapeutic.

Question which is from Astato which is sorry if I mispronounced it did you work with a project that uses deep learning models if yes what a takeaway can you give about deep learning framework which one would you recommend?

(01:08:45)

I have done some stuff with the deep learning models in terms of i’ve had a one or two projects that where was looking at building various neural networks for things , I personally don’t have too much experience in terms of like actually bringing it from like the development stage into production just because it didn’t really work in our use case so I don’t personally have like a specific recommendation about deep learning frameworks right now.

I do know some people who use TensorFlow though and really love it but like I said I only I didn’t really make it past the exploratory stage in my own experience.

Well, Megan! Thank you so much for presenting we had a lot of questions and a lot of interest it was a really great presentation yeah did you have any final words you wanted to say.?

I mean I think this was really just the three big themes I feel like really summarize what I’ve done so far but. I want to say thanks to all of you again for showing up and asking some great questions.

And definitely like feel free to reach out like on my contact informations up here as well and just know that it can feel very overwhelming especially early on in your career in Data Science.

And definitely like try not to get too discouraged by what you see other people doing and just know if you follow what you’re passionate about And what you want to learn about like things are going to fall into place for you and you’ll end up finding something that you like doing okay. Thank you.

Acknowledgments

We would like to thank Sethupathy for submitting the transcript.