Skip to main content

Featured

MDEC Global Online Workforce (GLOW) Program

       I recently attended GLOW online training hosted by MDEC. Honestly, I didn't think it will help me much. I'd registered myself on Freelancer.com  about 10 years ago when I started thinking about leaving my corporate job. I used up all my bids quota but didn't get any job on the site. Needless to say, I quickly gave up.     However, I am much more determined this time. Finally left my well paid corporate job - I thought I'd dive in. You know - try all I'd wanted to try. So I signed up on this program. You can find more information about it here .     I started receiving lots of emails about the program which I successfully ignored. There were reminders to view an introductory video on their site. After about a year of ignoring - I received an email about a training schedule. Maybe out of boredom - but I miraculously registered myself. Closer to the training date, a 'coach' or mentor assigned to me added me to a WhatsApp group for th...

Better Machine Learning Engineer Cares About Coding


Photo by Manki Kim on Unsplash
 

    One of my first Machine Learning gigs was to 'fix' codes for a Machine Learning model that was developed on MacOs to make it work on Windows. It sounded simple enough, and I was new to the platform - so I put in a low bid.

    Soon after, I got a DM with links to the code repository and asking me to reproduce results before I get chosen. I was skeptical but the job poster said 21 - yes, TWENTY-ONE people had attempted and failed. So the poster was now wary of anyone who said they can do it. I looked up at the other bid proposals and they all bid much much higher than me and were pretty established.

    I completed the task in an hour. A few more hours to let the model run and I had the results in less than half a day. The job poster was blown away. What did I do that's different?

1. Read the README file. 

    I can't stress it enough. I'm a software developer by trade and I've had my fair share of dealings with codes that work in only certain environments, only on developers machine etc etc. We can avoid these problems if we checkout the documentations first. In this case, they only have the codes and the README file. No requirements file (for Python codebase, this lists the packages needed and optionally their versions), no sample input/output datasets to verify with.

    Just by reading the README, I know that the code was written for Gensim library version 3.8.3. Therefore, all the other libraries I need to install must play nice with it.

2. Always use a virtual environment.

    It doesn't matter whether you use Conda virtual environment or Venv, USE IT. Python - and in fact most programming languages such as .NET are pretty nitpicky with the versions of the packages you use. By using a virtual env, you can isolate the codes' requirements and not break any other code bases you might have on your machine that needs different versions of Python etc.

    AT first I created my virtual env using my default Python version 3.10 and it didn't work. I keep getting cryptic error from Gensim. Since the code repo was shared publicly, I didn't think the code is buggy, so I Googled a bit on Gensim 3.8.3. Turns out it only works with Python 3.9 and below. I created a new virtual env with Python 3.9, ran the code and Walla! It works!

The Takeaway

    Coding/programming is a key tool/skill for Machine Learning Engineers, not just the knowledge on how to tune machine learning models or the types of ML/Deep Learning models out there. I know quite a few who just copy and paste codes from Tensorflow or StackOverflow - while you can get away with it, you won't be a good Machine Learning Engineer.

    Just like any other professions, master your tools to master your trade. In this case, it's just a simple dependency resolution skill and sticking to coding best practices. Simple, but one that 21 people didn't do.


Comments

Popular Posts