• Ever wanted an RSS feed of all your favorite gaming news sites? Go check out our new Gaming Headlines feed! Read more about it here.

Clay

Member
Oct 29, 2017
8,107
Imo. Absolutely. It just depends on the position. I've had analyst interviews where my interviewer asked me how to select the min/max, how to join a table and what's the difference between a left join and inner join. That job was for 80k.

But there are other positions that will ask you questions that dont require table data knowledge. These are a bit tougher. For example:

What is one method to identify duplicate records in a table?
Are you able to produce a second method?
Write a query to extract the month value of the current date
Given that the current date is in UTC, can you write a query to convert the timezone to the same timezone as Los Angeles?
Write a query to extract the first day of the month of the current date
Write a query to extract the last day of the month of the current date.

This job was for less. Just make sure you read the job description. If the HR person did the their job you will know if you are a good fit for the position.

Awesome, thanks! This is really helpful.
 

Irnbru

Avenger
Oct 25, 2017
2,128
Seattle
Imo. Absolutely. It just depends on the position. I've had analyst interviews where my interviewer asked me how to select the min/max, how to join a table and what's the difference between a left join and inner join. That job was for 80k.

But there are other positions that will ask you questions that dont require table data knowledge. These are a bit tougher. For example:

What is one method to identify duplicate records in a table?
Are you able to produce a second method?
Write a query to extract the month value of the current date
Given that the current date is in UTC, can you write a query to convert the timezone to the same timezone as Los Angeles?
Write a query to extract the first day of the month of the current date
Write a query to extract the last day of the month of the current date.

This job was for less. Just make sure you read the job description. If the HR person did the their job you will know if you are a good fit for the position.

Also to add I've interviewed people and honestly if you understand the basic select stuff, joins and procedures. Everything else can be learned. I think it is more important that I can trust that you can solve the problem on your own however you can. I dont give af how you do it. I look shit up all the time.

Id say this is SQL from an analyst pov as well. As a BI manager I'd start asking not only joins but also dig into performance, how to key tables, differences between kinds of databases, and architecture.

IE: How would you normalize a table? Tell me your though process to chose a distkey?
etc.
 
May 9, 2018
3,600
What is one method to identify duplicate records in a table?
Are you able to produce a second method?
Write a query to extract the month value of the current date
Given that the current date is in UTC, can you write a query to convert the timezone to the same timezone as Los Angeles?

Write a query to extract the first day of the month of the current date
Write a query to extract the last day of the month of the current date.
The bolded questions are notably bad because they depend on specific SQL syntax which may not be consistent across dialects.

(I'm also not sure how you would get the last day of the month in SQL consistently without galaxy-brain shenanigans, e.g. get the first day of the month, add a month, then subtract a day)
 

JeTmAn

Banned
Oct 25, 2017
3,825
The bolded questions are notably bad because they depend on specific SQL syntax which may not be consistent across dialects.

(I'm also not sure how you would get the last day of the month in SQL consistently without galaxy-brain shenanigans, e.g. get the first day of the month, add a month, then subtract a day)

Those are "Google it" questions, IMO. Version-specific function invocations.
 
Oct 27, 2017
16,547
Do y'all think the Andrew Ng course on Machine Learning will prepare me for a class on Machine Learning?

The professor who teaches it is tough as hell and I've failed 2 of her course already and Im trying be as ready as can be.
 

Kelsdesu

Member
Oct 25, 2017
4,463
The bolded questions are notably bad because they depend on specific SQL syntax which may not be consistent across dialects.

(I'm also not sure how you would get the last day of the month in SQL consistently without galaxy-brain shenanigans, e.g. get the first day of the month, add a month, then subtract a day)

I agree. I argued with the person asking because as you put they were SQL specific. Tbh I think it was one of those "big brain" I want to know your thought process questions that really piss me off.
 

Gazele

Member
Oct 25, 2017
972
Do y'all think the Andrew Ng course on Machine Learning will prepare me for a class on Machine Learning?

The professor who teaches it is tough as hell and I've failed 2 of her course already and Im trying be as ready as can be.

Hard to say without seeing the curriculum. It definitely couldn't hurt, but you have to weight what is going to be best bang for your buck in terms of time. Sounds like a tough professor though.
 
Dec 13, 2018
1,521
Do y'all think the Andrew Ng course on Machine Learning will prepare me for a class on Machine Learning?

The professor who teaches it is tough as hell and I've failed 2 of her course already and Im trying be as ready as can be.
If it's a tough course, and maybe rooted in more traditional ml, I can't recommend this book enough : http://users.isr.ist.utl.pt/~wurmd/Livros/school/Bishop - Pattern Recognition And Machine Learning - Springer 2006.pdf

many ml courses cover the first 7 chapters.

it's free and you can find all the support material online. Not a difficult to parse book either.

best of luck
 

madgorillaz

Member
Oct 28, 2017
439
Just wanted to say hello and introduce myself in this cool thread. I've got a decent stats and math background, currently in academia, and I want to transition into data science after graduation. I've dabbled in some ML algorithms here and there -- but don't feel like I'm an expert, or even have an intuitive understanding for a lot. I'm quite solid with regression techniques and time series analysis though.

Looking forward to learning from you all.
 

HarryHengst

Member
Oct 27, 2017
1,047
Write a query to extract the month value of the current date
Given that the current date is in UTC, can you write a query to convert the timezone to the same timezone as Los Angeles?
Write a query to extract the first day of the month of the current date
Write a query to extract the last day of the month of the current date.
Those are some spectacularly bad questions. I've been working in BI/data engineering for 10 years but fuck if i know how to get the last day of the month from the current date. Every time i have to do it i just google it and look on stack overflow. I've interviewed people too and i have never asked questions about syntax. If someone knows how joins work and understands indexing and knows how to find duplicates and how to create tables for many-to-many relations im confident that person is able to figure out how to google such questions.
 

Kelsdesu

Member
Oct 25, 2017
4,463
Those are some spectacularly bad questions. I've been working in BI/data engineering for 10 years but fuck if i know how to get the last day of the month from the current date. Every time i have to do it i just google it and look on stack overflow. I've interviewed people too and i have never asked questions about syntax. If someone knows how joins work and understands indexing and knows how to find duplicates and how to create tables for many-to-many relations im confident that person is able to figure out how to google such questions.
As I said in a previous post. I agree. There was another element to that interview that I'd rather not get into when comes to the current job market and being African American, but I'd rather not go down that path and keep the vibes in this thread positive.
 

Kelsdesu

Member
Oct 25, 2017
4,463
I have a cell that is stored as type timedelta64[ns]. That I am having a hard time parsing since apparently I can't use ".split". Its currently in this format

0 days 05:14:00.000000000

Do I need to convert it into datetime64 format?

EDIT:
Found a solution using :

df['Time'].astype(str).str[-18:-10].

Bless the Stack Overflow gods for providing this wonderful bounty.
 
Last edited:

Pau

Self-Appointed Godmother of Bruce Wayne's Children
Member
Oct 25, 2017
5,837
My summer internship was cancelled because of COVID-19, and I've had no luck in getting another one. Got to the last round for another one, but because I applied so late the company had already made an offer by the time I turned in my technical assignment. Otherwise, I haven't gotten a single interview from about ten applications. I'm just getting really discouraged.

I don't know if my expectations are just not realistic or if I'm just not a good candidate. I got the previous internship after only applying to about five companies, but really only because I met the recruiter at a career fair. I didn't get an interview for the others. I got my previous data analyst job after about the same number of applications, so I'm just not sure what a "good" or realistic interview per application rate is. I don't know at what point I should seriously reconsider my approach. Out of how many applications do you all get an interview?

Doesn't help that I'm seeing all of my cohort start their internships.

If anyone has any leads or suggestions, I'd really appreciate it.
 
Oct 27, 2017
16,547
My summer internship was cancelled because of COVID-19, and I've had no luck in getting another one. Got to the last round for another one, but because I applied so late the company had already made an offer by the time I turned in my technical assignment. Otherwise, I haven't gotten a single interview from about ten applications. I'm just getting really discouraged.

I don't know if my expectations are just not realistic or if I'm just not a good candidate. I got the previous internship after only applying to about five companies, but really only because I met the recruiter at a career fair. I didn't get an interview for the others. I got my previous data analyst job after about the same number of applications, so I'm just not sure what a "good" or realistic interview per application rate is. I don't know at what point I should seriously reconsider my approach. Out of how many applications do you all get an interview?

Doesn't help that I'm seeing all of my cohort start their internships.

If anyone has any leads or suggestions, I'd really appreciate it.
Try local government, I've seen a lot of DS internships in my city.
 

Pau

Self-Appointed Godmother of Bruce Wayne's Children
Member
Oct 25, 2017
5,837
Try local government, I've seen a lot of DS internships in my city.
Unfortunately, my city doesn't seem to have any open. I could potentially move to NYC for the summer, so I'll check that too. What titles are you seeing? Are the positions mostly up on the local government job portal?
 
Oct 27, 2017
16,547
Unfortunately, my city doesn't seem to have any open. I could potentially move to NYC for the summer, so I'll check that too. What titles are you seeing? Are the positions mostly up on the local government job portal?
I was getting direct links to positions through my school, I've seen Data Analyst and Science. Both part of the NYC Mayor's office iirc. I'll add some links later if I can find them.
 

Readler

Member
Oct 6, 2018
1,972
So I'm finishing up grad school at the moment and am trying to find a job now (lol), however I am struggling to actually find something suitable.
It seems "Data Scientist" is very much a senior position obviously, so I would appreciate if some of you could outline how you started out.

My background:
B.S. in Computer Science, M.S. in Data Science
Two internships where I first worked as a Data Analyst of sorts, and another one where I designed a data quality framework for the company I was at.
Once I've submitted my thesis, I want to look more into Data Engineering (i.e. MLOps, DataOps, all that cloud malarkey basically) and build a somewhat strong foundation with AWS or Azure, both of which I've got some limited experience with. I also need to refresh my SQL skills, but don't worry I put that on my CV lol

I'm based in Europe btw.
 

Pau

Self-Appointed Godmother of Bruce Wayne's Children
Member
Oct 25, 2017
5,837
I know it's late but I wasn't able to find any of the links, sorry.
Absolutely no worries! :) I did some looking myself, but it seems like they went down fairly quickly. But it worked out anyways as I accepted an internship offer last week. :D

So I'm finishing up grad school at the moment and am trying to find a job now (lol), however I am struggling to actually find something suitable.
It seems "Data Scientist" is very much a senior position obviously, so I would appreciate if some of you could outline how you started out.

My background:
B.S. in Computer Science, M.S. in Data Science
Two internships where I first worked as a Data Analyst of sorts, and another one where I designed a data quality framework for the company I was at.
Once I've submitted my thesis, I want to look more into Data Engineering (i.e. MLOps, DataOps, all that cloud malarkey basically) and build a somewhat strong foundation with AWS or Azure, both of which I've got some limited experience with. I also need to refresh my SQL skills, but don't worry I put that on my CV lol

I'm based in Europe btw.
I'm interested in what folks respond as I'll be in your shoes soon. Good luck!
 

Qwark

Member
Oct 27, 2017
8,017
So I'm finishing up grad school at the moment and am trying to find a job now (lol), however I am struggling to actually find something suitable.
It seems "Data Scientist" is very much a senior position obviously, so I would appreciate if some of you could outline how you started out.

My background:
B.S. in Computer Science, M.S. in Data Science
Two internships where I first worked as a Data Analyst of sorts, and another one where I designed a data quality framework for the company I was at.
Once I've submitted my thesis, I want to look more into Data Engineering (i.e. MLOps, DataOps, all that cloud malarkey basically) and build a somewhat strong foundation with AWS or Azure, both of which I've got some limited experience with. I also need to refresh my SQL skills, but don't worry I put that on my CV lol

I'm based in Europe btw.
This is my story. 10 years ago I was in my senior year of my bachelor degree in Comp Sci (still no master's degree), I was paired up with a local healthcare software startup for my senior capstone project. This itself was not in the realm of data science, but I was able to convert that into an internship, and then eventually a full-time job as a software engineer and now a senior software engineer. Most of my time here has been spent on our reporting team, so creating reports, datamarts, ETL processes etc. Being a startup, we tried to do things cheap, so a lot of my projects were with opensource products and it showed, a lot of them were rough. But now we have decent resources and I'm just now getting into more advanced topics like predictive analytics.

I'll be honest, I got really lucky and never really had to interview at all. I got into a startup early on so I was able to learn as the company grew. We're no longer a startup, but I still like it here, it feels comfortable and I like what I'm doing.
 

Pau

Self-Appointed Godmother of Bruce Wayne's Children
Member
Oct 25, 2017
5,837
Any advice for me? I start my first university machine learning class next week
Congrats! I hope you enjoy it!

Do you have a sense for how much pen and pencil calculations they'll have you do? Or how much of it is coming stuff from the ground up versus using a library like PyTorch or Tensorflow?

If it's the former, these are the math concepts I remember using the most:

Linear Algebra: Mostly matrix multiplication. Even if you aren't doing the calculation by hand, you need to know how to set up and transpose to get the proper dimensions.

Markov Chains: Will be extremely useful once you get to recurrent neural networks.

Partial derivatives
 
OP
OP
Raticus79

Raticus79

Community Resettler
Member
Oct 25, 2017
1,034
I made it through some big cuts at my company (oil and gas), fortunately. Wound up staying in more of a developer/integration role, helping improve the accessibility and quality of data previously siloed in monolithic applications. I still have a hand in building up our python expertise, and have been putting bits and pieces from the big data world to good use (parquet's great), but the cooler stuff like Kafka will be on the back burner for a while until we have the basics covered. Still supporting data scientists in the mean time.
 

Readler

Member
Oct 6, 2018
1,972
I made it through some big cuts at my company (oil and gas), fortunately. Wound up staying in more of a developer/integration role, helping improve the accessibility and quality of data previously siloed in monolithic applications. I still have a hand in building up our python expertise, and have been putting bits and pieces from the big data world to good use (parquet's great), but the cooler stuff like Kafka will be on the back burner for a while until we have the basics covered. Still supporting data scientists in the mean time.
I was affected by this lol

Congrats! I hope you enjoy it!

Do you have a sense for how much pen and pencil calculations they'll have you do? Or how much of it is coming stuff from the ground up versus using a library like PyTorch or Tensorflow?

If it's the former, these are the math concepts I remember using the most:

Linear Algebra: Mostly matrix multiplication. Even if you aren't doing the calculation by hand, you need to know how to set up and transpose to get the proper dimensions.

Markov Chains: Will be extremely useful once you get to recurrent neural networks.

Partial derivatives
Good stuff.
Also adding to this: if you wanna be really good at it, familiarise yourself with the concepts. You don't have to be able to recite everything on a moment's notice when asked, but learn and understand the mathematical concepts behind each algorithm. Sure, you can prooobably get by by just treating everything as a neat magic black box that forecasts the future (I know a lot from my course who did :P), but I urge you to really get into the nitty gritty - it will help in the long run. For instance, it turns out that neural networks are actually super simple once you know how they work.

I made it to the next (and hopefully final?) round at a startup I applied to. I am now presented with a coding challenge involving time-series data, which I can do anytime, but for which I have an hour to do once I start. According to the mail it's some messy real life data, and they want me to answer some questions. Any advice?
Not gonna lie, I'm a bit nervous, since out of all my applications it's the only one where I got so far, but being on a timer already makes me a bit anxious haha
 

Antrax

Member
Oct 25, 2017
13,268
I was affected by this lol


Good stuff.
Also adding to this: if you wanna be really good at it, familiarise yourself with the concepts. You don't have to be able to recite everything on a moment's notice when asked, but learn and understand the mathematical concepts behind each algorithm. Sure, you can prooobably get by by just treating everything as a neat magic black box that forecasts the future (I know a lot from my course who did :P), but I urge you to really get into the nitty gritty - it will help in the long run. For instance, it turns out that neural networks are actually super simple once you know how they work.

I made it to the next (and hopefully final?) round at a startup I applied to. I am now presented with a coding challenge involving time-series data, which I can do anytime, but for which I have an hour to do once I start. According to the mail it's some messy real life data, and they want me to answer some questions. Any advice?
Not gonna lie, I'm a bit nervous, since out of all my applications it's the only one where I got so far, but being on a timer already makes me a bit anxious haha

It's super simple, but have all the stuff you might want on hand before starting. I'm the queen of "click start FUCK FORGOT A PEN"

Also, try to chill for 30 minutes or so before starting. When I was teaching, that was how I always told my students to sit for any kind of exam or practical. Cramming has limited value, and we see genuine declines in performance when people are stressed.
 

Readler

Member
Oct 6, 2018
1,972
It's super simple, but have all the stuff you might want on hand before starting. I'm the queen of "click start FUCK FORGOT A PEN"

Also, try to chill for 30 minutes or so before starting. When I was teaching, that was how I always told my students to sit for any kind of exam or practical. Cramming has limited value, and we see genuine declines in performance when people are stressed.
Fair enough.
I also thought of just compiling a little document with commonly used lines; I expect do some EDA, so having a few quick and dirty seaborn plots handy should ease things up I guess?
I just hate these timed assignments, they're the only thing that make me anxious.

Also - anyone in here working in the London area?
 

Pau

Self-Appointed Godmother of Bruce Wayne's Children
Member
Oct 25, 2017
5,837
Fair enough.
I also thought of just compiling a little document with commonly used lines; I expect do some EDA, so having a few quick and dirty seaborn plots handy should ease things up I guess?
I just hate these timed assignments, they're the only thing that make me anxious.

Also - anyone in here working in the London area?
I think having a short "cheat sheet" document is a great idea! I'm someone who still has trouble adjusting to the way Pandas and Seaborn treat groups and cross tabs. After three years of SAS my brain just defaults to that now, especially if I've had any sort of break from programming in Python. I'd suggest having examples for the most common graphs or transformations you'd do, and maybe a few that are not as common but that you know always give you a hard time.

Totally agreed on the timed assignments hate. It doesn't even matter what kind. I've always hated writing timed essays in school, despite being a very strong writer.

Good luck!
 

Readler

Member
Oct 6, 2018
1,972
I think having a short "cheat sheet" document is a great idea! I'm someone who still has trouble adjusting to the way Pandas and Seaborn treat groups and cross tabs. After three years of SAS my brain just defaults to that now, especially if I've had any sort of break from programming in Python. I'd suggest having examples for the most common graphs or transformations you'd do, and maybe a few that are not as common but that you know always give you a hard time.

Totally agreed on the timed assignments hate. It doesn't even matter what kind. I've always hated writing timed essays in school, despite being a very strong writer.

Good luck!
Lol fuck SAS.
But yeah, think that's what I'm gonna do.
And yes! Even looking at my finished master's now, I can clearly see that I got noticeably better marks in project-based subjects, than those where the mark was decided by an exam. Even after four years of Python, I still have to look up even the most basic things haha

Thanks :)

How'd your internship (I believe it was an internship?) go btw?
 

Pau

Self-Appointed Godmother of Bruce Wayne's Children
Member
Oct 25, 2017
5,837
Lol fuck SAS.
But yeah, think that's what I'm gonna do.
And yes! Even looking at my finished master's now, I can clearly see that I got noticeably better marks in project-based subjects, than those where the mark was decided by an exam. Even after four years of Python, I still have to look up even the most basic things haha

Thanks :)

How'd your internship (I believe it was an internship?) go btw?
Did you have a lot of exams involving coding in your program?

I just started my internship a couple of weeks ago, so it's still pretty early on. I didn't expect working remotely to be so strange as a new employee, but there are definitely moments where I feel lonely and lost.
 

Readler

Member
Oct 6, 2018
1,972
Did you have a lot of exams involving coding in your program?

I just started my internship a couple of weeks ago, so it's still pretty early on. I didn't expect working remotely to be so strange as a new employee, but there are definitely moments where I feel lonely and lost.
Not in my master's, no. My undergrad (CompSci) had a lot of them, though. And well, coding under exam conditions (time limit, no internet, sometimes only on paper) just sucks. We had a course on functional programming (Haskell), which I really enjoyed, but coding in Haskell consists of taking up to an hour to think about the problem only to come up with a one liner in the end. Unlike OOP, it's hard to come up with an half-arsed solution which works but might fail in edge cases and stuff.
For me it's just really the anxiety. I'm not too anxious in my day-to-day, but once the timer starts my thoughts just go haywire haha. I back myself to know everything under normal circumstances.

Ugh, tell me about it. I started my current internship four months ago, right when the Rona properly kicked off. It looks good on a CV, sure, but I didn't get to learn nearly as much as I had hoped. Hope yours turns out better.
 
OP
OP
Raticus79

Raticus79

Community Resettler
Member
Oct 25, 2017
1,034
Last edited:

HarryHengst

Member
Oct 27, 2017
1,047
The more i work with R (and Julia), the more I realize how badly designed the Python data ecosystem actually is. Sure, you can do everything with it, but everything feels like a struggle compared to R's tidyverse. Pandas is a clusterfuck of things joined together with an inconsistent api, matplotlib requires a mad genius to understand to the point people wrote wrappers around it that still dont really solve the issue. Props for Pymc3 though, that is some good stuff. Working with libraries like Tensorflow or Pytorch feel like i'm writing in completely different languages, so that is also bad.

Julia is cool though, they looked at Python and went ''alright, let's do that, but faster and with sane syntax for the libraries''. The speed at which it's ecosystem is developing is impressive, although mostly focused in the scientific computing corner. It's biggest problem is missing lots of convenience libraries to make the little things easier. R has a billion libraries for everything, Julia still requires a bit more effort to build your own things. Biggest pro though: 1-based indexes. 0-based makes sense when you're working in C, but it's insane in a data-based environment.
 

Readler

Member
Oct 6, 2018
1,972
Fair enough.
I also thought of just compiling a little document with commonly used lines; I expect do some EDA, so having a few quick and dirty seaborn plots handy should ease things up I guess?
I just hate these timed assignments, they're the only thing that make me anxious.

Also - anyone in here working in the London area?
So, just as an update to this. I scored 16% on the assignment, even though those were very easy questions. Some things I wonder how they could have even been hardcoded into the score (like the results of a linear regression are bound to have some variance to them depending on your split for instance) and after not getting back to me, I just emailed them myself today only to get the rejection an hour later.

A bit bummed out about it ngl. Especially since the questions indeed on the easier side of things.
 

Maynerd

Member
Oct 27, 2017
2,522
Redmond, WA
I work for a corporation and my job is not data science related. My team has been using me as their data analysis person as I enjoy data. I have been figuring out data analysis on my own building data charts in Excel and some minor work in Power BI. I would like to get some data science abilities under my belt. Due to COVID my company has put a hold on any training at this time. I'm looking for what could be done at very low or no cost to get things going. I'm not exactly sure where I should start and what I should be focusing on as well. I do know my company uses Power BI and has databases that use SQL.

I think more than anything, I need a clear plan on what I need to learn to be competent and above average from a skill perspective. A list that says, do this, then that, then that, then that, done. LOL

Would appreciate any advice.
 

Irnbru

Avenger
Oct 25, 2017
2,128
Seattle
I work for a corporation and my job is not data science related. My team has been using me as their data analysis person as I enjoy data. I have been figuring out data analysis on my own building data charts in Excel and some minor work in Power BI. I would like to get some data science abilities under my belt. Due to COVID my company has put a hold on any training at this time. I'm looking for what could be done at very low or no cost to get things going. I'm not exactly sure where I should start and what I should be focusing on as well. I do know my company uses Power BI and has databases that use SQL.

I think more than anything, I need a clear plan on what I need to learn to be competent and above average from a skill perspective. A list that says, do this, then that, then that, then that, done. LOL

Would appreciate any advice.
As someone who did this transition ( was in finance / program management ). SQL, PowerBI/Tableau, Python ( Pandas ) . Start from the ground up. I ended up getting a masters in business analytics and am now a Sr. Business Intelligence Engineer/Manager.

Dig into your companies data architecture and try to get involved in any systems integration projects as well. You want to understand process and data flow.
 

Readler

Member
Oct 6, 2018
1,972
I got invited to a 30min technical interview for next week. I genuinely don't what to expect and I have just been reviewing a bunch of ML concepts and preparing to explain them when needed (e.g. "How does a Random Forest work?").
Anyone want to share their experiences? It's for a consultancy infamous for being hard to get into so I really really don't want to bottle it lol
 
Oct 27, 2017
3,665
I work for a corporation and my job is not data science related. My team has been using me as their data analysis person as I enjoy data. I have been figuring out data analysis on my own building data charts in Excel and some minor work in Power BI. I would like to get some data science abilities under my belt. Due to COVID my company has put a hold on any training at this time. I'm looking for what could be done at very low or no cost to get things going. I'm not exactly sure where I should start and what I should be focusing on as well. I do know my company uses Power BI and has databases that use SQL.

I think more than anything, I need a clear plan on what I need to learn to be competent and above average from a skill perspective. A list that says, do this, then that, then that, then that, done. LOL

Would appreciate any advice.
Touching on what Irnbru mentioned, the single biggest thing you can do which smaller companies in my experience regularly neglect is pulling information directly from the database to run automated reports. Knowing the data flow inside-out and becoming very comfortable with all of the relevant databases present in the organisation can have an absolutely monumental impact on what you do.

Becoming familiar with SQL (whatever variety is most relevant) is very easy to do for basic tasks which comprise most tasks (e.g. non-dynamic SQL procedures which, outside of a consulting company, is most likely what you'll be dealing with; getting into more automated code can get trickier).

Becoming familiar with Python (particularly Pandas for any basic functionality or reporting) is quick to learn but can have a monumental business impact for a company with little or poor automated processes.

Excel is very often used as a crutch, so where possible I would strongly, strongly recommend avoid using Excel where possible (saving into it? Fine, but doing anything with the actual file? Not so good). Unless you're building macros in Excel to do what is needed for you (personally, even though I like VBA, I found it a pain to learn initially with a high learning curve), it's often very easy to think "I've got most of this where it is, I'll just finish it off in Excel" which might save time the first time you do it but it means any time the same task needs to be done it's going to be time consuming and slower than just pressing run (and having to open multiple different items, waiting for tasks to finish, etc. is time consuming and frustrating); building it right the first time means it's done and can be set on a recurring schedule as needed.

If you're familiar with SQL and Python, one of the biggest things (even though it's a very basic idea and very simple) which transformed business reporting for the business was introducing a workflow of:
1. Developing an SQL process to extract as much (relevant) data as possible while minimising the hit on performance.
2. Using PYODBC or cx_Oracle to connect to the database and run the query (inserting any variables as needed).
3. Using Pandas to summarise all of the data as required (bonus: use your preferred library to create some visualisations in a sharable manner).
4. Dump the results in an Excel (or your preferred format).
5. Have a Macro which loops through and ''finishes' the files (e.g. visualisations, basic formatting, etc.; the format will always be broadly similar so Macros can be designed more easily to accommodate this).

Even getting to (4) brought an enormously enhanced data accessibility and data awareness within the department, as once you have something like that, it's very easy to set it up on a scheduler.
 

Pau

Self-Appointed Godmother of Bruce Wayne's Children
Member
Oct 25, 2017
5,837
Anyone here done a HackerRank coding assessment specifically for data science? I've done very few HankerRank challenges on my own back in undergrad, so more than four years ago, so I'm pretty nervous. I've got about two weeks to prepare, but I really don't know what to expect.

I got invited to a 30min technical interview for next week. I genuinely don't what to expect and I have just been reviewing a bunch of ML concepts and preparing to explain them when needed (e.g. "How does a Random Forest work?").
Anyone want to share their experiences? It's for a consultancy infamous for being hard to get into so I really really don't want to bottle it lol
How did this turn out?
 

Readler

Member
Oct 6, 2018
1,972
Anyone here done a HackerRank coding assessment specifically for data science? I've done very few HankerRank challenges on my own back in undergrad, so more than four years ago, so I'm pretty nervous. I've got about two weeks to prepare, but I really don't know what to expect.


How did this turn out?
I got into the last round and then they passed in favour of another candidate. The interview itself was pretty easy - so far having personal projects listed on my CV really paid off, as most technical interviews are just about those. We basically just discussed a project of mine in detail, i.e. what my objective was, what my data was, what tech is used, results, evaluation etc.

In the meantime I also had a coding assessment on HackerRank actually, for yet another consultancy (jury's still out on that one). Not gonna lie, it was pretty fucking tough. I had two hours and the assessment had 10 tasks. I was all about one overarching case, but the exercises themselves were independent of each other. Some of them were rather easy; in one of them I simply had to calculate the speed of a given object with the data I had (a column consisting of two timestamps, and information that the track was x meters long). Others not to much. For one I had to manually implement an encoding method I had never heard of that was only described in the task and that also literally doesn't exist on the Internet if you were to copy it off somewhere (trust me I tried lol).
Then there was also some general multiple choice questions. Some Bayesian stats, some about a Random Forest and loss functions, and another open ended question where you could write some text.

I didn't manage to all the tasks, but my interviewer already indicated that this might be case. I prepared for it by doing some Kaggle challenges but ultimately that didn't really help much as everything was more or less new to me. I guess just having a good grasp of general concepts? I.e. if your Python is rusty, do some of that. Some basic stats is also nice, as I didn't quite expect that tbh lol Thought it would be strictly coding based.
Let me know if you have any more questions.
 
Oct 27, 2017
3,665
I'm trying to think of ways to advance our reporting capabilities, and I'm wondering if anybody here would have any ideas on a good way of doing this?

----

Brief info:
-Currently, our business reporting process is for reports to be completed in a PowerPoint format, data to be saved in Excel, and shared appropriately.
-At the moment, for a lot of the reports of our department I've been able to automate this process by building PowerPoints templates for reports, run a Python script to analyse the reports, and populate the PowerPoints with the relevant info and graphs.
-The 'new' process is 'nice' in that it's reduced week long tasks into a thirty minute job, however it's not 'nice' in the sense that the solution is rather clunky; sharing PowerPoints for these reports are not ideal and I'm trying to shift the business to dashboards where feasible (Salesforce Core, Google Analytics, Qlik - both of these are a bit limited for what I need which is why the PowerPoints still are unfortunately present). I want more powerful and interactive graphics for presentation, and much more customisable reports.
-Tableau or PowerBI would be an easy solution, but much of the executive leadership is very technology averse; requiring additional software to be installed is almost certainly going to be a no-go. This pretty much kills off the quickest path to the solution.
-One way I was thinking is to (1) generate a HTML template for the reports, (2) Python script to populate that and save the data to CSV, and (3) D3 for the visualisations using read csv to pull the data so I could send on a ZIP with the data + html and then use that in place of PowerPoints.

I'm not sure if there's a better way to do this? Whatever the solution is, it needs to be highly automated as the current solution is already quite hands-off and represented a big time reduction, so I don't want to develop a solution that increases manual effort (i.e. it would need to be as simple as running a Python script and it's complete) as resourcing on the team is already stretched incredibly thin.

----

More detail if needed:
The IT systems and backend design is incredibly convoluted and, outdated, and multiple different platforms (each with separate ETL processes resulting in discrepancies across systems) are used. There are few VMs available for use (pretty much all are constantly at capacity), and we don't have an internal web platform to develop on.

In general, most development had previously been outsourced to various third party vendors. Recently Salesforce platforms have been adopted in addition to the existing ones (and although Salesforce is okay, the cost makes me reluctant to adopt the analytics platform and licences for others in the company as it makes us very dependant on it going forward). Similarly, the industry has been heavily hit by COVID and additional platforms or costs need to almost universally be approved by the CFO and MD resulting in solutions involving existing technology. I've a large degree of experience in Python and SQL (in analytics, data engineering, and data modelling), have a growing role in front-end design, have admin access over most platforms we're using, and have experience developing the backend of a Django fintech application in a consulting firm. My range of experience in 'transformative data insights projects' however is very low, and I've not had much exposure to what the best approach should be given the reluctance of any new technology to be adopted (a BI tool would just be the simplest), so although I'm sure I could build a solution if I'm going to put the time in I want to make sure I do it the right way.

Any ideas would be really useful!
 

Readler

Member
Oct 6, 2018
1,972
Anyone who's done a "data science case interview?" I'm in the final round for a pretty hot job tomorrow and the company seems keen also. I just need to not bottle tomorrow's interview, though they did tell me there's not much I can prepare. They didn't say much about, except they want me to see how I'd tackle a hypothetical project we'd go through together.
 

harry the spy

Member
Oct 25, 2017
3,075
Cool thread!

Anyone who's done a "data science case interview?" I'm in the final round for a pretty hot job tomorrow and the company seems keen also. I just need to not bottle tomorrow's interview, though they did tell me there's not much I can prepare. They didn't say much about, except they want me to see how I'd tackle a hypothetical project we'd go through together.

It very much depends on the specifics - Is it a take home exam or live? Does it involve coding?

Anyone here done a HackerRank coding assessment specifically for data science? I've done very few HankerRank challenges on my own back in undergrad, so more than four years ago, so I'm pretty nervous. I've got about two weeks to prepare, but I really don't know what to expect.
Hope it went well.
 

Qwark

Member
Oct 27, 2017
8,017
It's a 90mins live thing. They said it doesn't involve coding (which would arguably be tough to pull of via Teams), but more like an abstract way to go through a project.
We've done interviews like this before. It's really more to see how you would go about solving the problem, not the actual solution. And to get a grasp of what key concepts you have knowledge of. I'm just speaking about ours now, but we made them pretty difficult, so you might feel like you bombed it hard, but that's really just to get a grasp of your knowledge and where you can develop more. So if it's like that, don't be surprised and don't feel too bad if you think you did terrible.

And for the love of God, don't make stuff up if you don't know. You can say "If I was to guess, I would say it is ____", but be honest about what you know and don't know, because they will be aware.
 

Readler

Member
Oct 6, 2018
1,972
We've done interviews like this before. It's really more to see how you would go about solving the problem, not the actual solution. And to get a grasp of what key concepts you have knowledge of. I'm just speaking about ours now, but we made them pretty difficult, so you might feel like you bombed it hard, but that's really just to get a grasp of your knowledge and where you can develop more. So if it's like that, don't be surprised and don't feel too bad if you think you did terrible.

And for the love of God, don't make stuff up if you don't know. You can say "If I was to guess, I would say it is ____", but be honest about what you know and don't know, because they will be aware.
That sounds exactly the kind of thing they were telling me about. Could you maybe be more specific and maybe even give some examples?

Saying you made them "pretty difficult" has made me a bit nervous now haha
 

Qwark

Member
Oct 27, 2017
8,017
That sounds exactly the kind of thing they were telling me about. Could you maybe be more specific and maybe even give some examples?

Saying you made them "pretty difficult" has made me a bit nervous now haha
Sure, here's some examples from our list of questions. We didn't really have 1 hypothetical project, it was general knowledge questions followed by a series of smaller hypothetical challenges. Also, it should be noted that this was for a QA position, an engineer interview would be a little more challenging than this.

  • Are you familiar with SQL? OLAP/Cubes? SSRS?
  • What is the difference between a primary key and a foreign key?
  • What is a composite key?
  • What is the difference between an inner join and an outer join?
  • What is the difference between NULL and 0 in terms of SQL?
  • What would be a reason to denormalize data in a database?
  • Can you tell me what an index is, and what the difference between a clustered and non-clustered index is?
  • Can you tell me the differences or pros and cons of row-store vs column-store databases?
  • What is performance testing, and how might you do performance testing of reports, dashboards, and ETL procedures generated from a database?
  • How would you test that data is correctly loaded from a source database into a target database?
  • Can you walk me through how you would identify duplicate records in a database table?
I felt kind of bad afterwards, because it seemed pretty difficult to the candidates. We offered the position to one that did not know all the answers, but she was honest about it, and seemed very willing to learn.
 

Readler

Member
Oct 6, 2018
1,972
Sure, here's some examples from our list of questions. We didn't really have 1 hypothetical project, it was general knowledge questions followed by a series of smaller hypothetical challenges. Also, it should be noted that this was for a QA position, an engineer interview would be a little more challenging than this.

  • Are you familiar with SQL? OLAP/Cubes? SSRS?
  • What is the difference between a primary key and a foreign key?
  • What is a composite key?
  • What is the difference between an inner join and an outer join?
  • What is the difference between NULL and 0 in terms of SQL?
  • What would be a reason to denormalize data in a database?
  • Can you tell me what an index is, and what the difference between a clustered and non-clustered index is?
  • Can you tell me the differences or pros and cons of row-store vs column-store databases?
  • What is performance testing, and how might you do performance testing of reports, dashboards, and ETL procedures generated from a database?
  • How would you test that data is correctly loaded from a source database into a target database?
  • Can you walk me through how you would identify duplicate records in a database table?
I felt kind of bad afterwards, because it seemed pretty difficult to the candidates. We offered the position to one that did not know all the answers, but she was honest about it, and seemed very willing to learn.
Thanks for that, appreciate it.

That being said, I know the answer to like four questions lol
I barely work with SQL though.
 

Qwark

Member
Oct 27, 2017
8,017
Thanks for that, appreciate it.

That being said, I know the answer to like four questions lol
I barely work with SQL though.
Lol, I don't know all the answers either and I have this job. There's a ton of different positions related to data science so it really just depends. We're a SQL shop but you might not need to know anything about SQL. Best of luck!