10 Portfolio Projects You Can Try As An Entry-Level Data Analyst/Scientist By Daniel Madu
Written by Qudirah on Medium
I hate the word newbie. If you are in a hurry, skip to the third paragraph. I always do this “catching up” thing before going straight to the point.
It’s April. I missed out on writing twice last month so I will have to write four times this month. I do have an explanation though, I was so hooked by some personal projects and work of course and I participated in my first hackathon!! and Outreachy, again (the story isn’t so much different from the last.) March was so chaotic, but I am glad I participated in so many things. Hopefully, everything works out in the end.
If I am sure, my entire journey clocked a year by February. Looking at how I started and how my growth has been honestly exponential, I have God ofc to thank and my love for projects. Best believe if I learn how to spell ‘A’ I will enter a spelling bee competition. I constantly set myself up (I regret it sometimes) which has made me learn so much in so little time. Since I read medium posts about data a lot, I tend to have ideas about a lot of things so even if you ask me to jump on a project with you and I have just a little idea about what it is, I will still jump on it. I am always willing to learn and put to test what I am learning and another thing about me is I despise common projects. I just will not do it if it is common. So do not expect the covid analysis or titanic project on the list. Ofc you can try them for practice but honestly not for your portfolio.
In the course of my journey, here are 10 projects I had engaged in to build my portfolio/career.
Crop Recommendation System
Tools used: Python, HTML, CSS, Flask, Basic ML knowledge
Difficulty: Easy
This was the first project I ever did and even though I hate it so much now, I’m so proud of it. I built a decision tree model that recommends the best crop under certain weather and soil condition. I deployed it locally using Flask and I have a terrible version of the project on my github currently so I do not want to link it. When I push a better version, I will link it here.
2. Movie recommender system
Tools used: Python, Knowledge of NLTK and Cosine Similarity, Heroku, Streamlit
Difficulty: Medium
Now, this was my second project but it was nothing like the first project. It uses NLP and cosine similarity. I had just finished Andrew Ng’s Machine learning course on Coursera and watched a TMDB movie recommender tutorial on YouTube so I built one on the Netflix dataset. I also worked on streamlit to allow user access and even deployed using Heroku. For me, this is the hardest project I have ever done. I even cried. Currently, I have learned better ways to do things but I did learn a lot from it. This is a link to the github. It needs some tidying but it’s not that terrible.
3. Forbes 2022 EDA using Python
Tools used: Python (Pandas and Matplolib)
Difficulty: Easy
This was the first EDA project that I published. I had written about it too on this link. The project was easy, it made me realize you learn from small projects too. I revised my knowledge of Pandas and Matplolib. I also learned how to ask the right questions, and how analysis is targeted toward uncovering something. A whole lot of people got to know me through this project too. This is a GitHub link to the project.
4. Market Basket Analysis
Tools used: Python(pandas, matplotlib, association rules)
Difficulty: Medium
I haven't posted about this project yet but it’s one of the projects I think a data analyst should try. You get to understand association rules, how products in a company sell, and which products are best sold with each other. How a high-sales product can aid in selling a low-sales one and so on. I enjoyed learning and doing this one and might be pushing it on my GitHub soon but before then you should research and try it. It is easy.
5. Implementing Gayle-Shapley’s Stable Matching Algorithm
Tools Used: Python
Difficulty: Medium
Now, this isn’t a data-related project. I went for an academy program last year that is python oriented and I was opportune enough to implement this algorithm in python. This algorithm is so interesting. The Gayle-Sharply matching algorithm is aimed at ensuring stable matching. The end goal is meant to be that everyone gets married to a (man)/(woman) and they are all happy with their matches. They all get to be with their most available preference. I don’t think I am explaining it well enough. I might dedicate a whole post to it but before then, you can read/research about it on google.
6. The Bechdel test
Tools used: Tableau, Python (For analysis)
Difficulty: Easy
The Bechdel Test ascertains there exists at least a scene in a movie where a woman speaks to another woman and it isn’t about a man. I will definitely write a post about this project. It’s one of the ones that hooked me on the first read. The moment I heard of this test, I wanted to do something with it to tell people about it. I linked it with the evolution of feminism and researched if the impact of feminism has improved how society viewed women. As such, I grouped the years into different centuries and observed the number of movies that passed the test over the years. I even made a tableau visualization for it but I haven’t perfected it yet. I haven’t posted about it either.
7. Sentiment Analysis Project
Tools: Python, NLTK, Power BI
Difficulty: Easy
I had done a sentiment analysis project when black panther 2 came out and I did another recently with two different libraries. It’s quite easy to do and I think it’s something every data analyst should try. I even visualized it using Power BI and I dared to use a black background. Yes. I did that. Here is a link to the post: Black Panther.
8. Data science job salaries
Tools Used: PostgreSQL, Excel, Power BI
Difficulty: Medium
Again, one of the projects that made me out there. I got so many reviews and feedback on this project. I used SQL, Excel for cleaning, and Power BI for visualization. I had written about it and published it too on this link. The data was gotten from this link and I explored the salaries of data professionals by their professions, mobility, employment type, and many more. SQL was used for the data analysis. I had used window functions and subqueries and honestly, I was able to properly practice what I had learned.
9. Classification of a phishing mail
Tools used: Python
Difficulty: Hard
This is one of the toughest projects I have engaged in. I built models that classify phishing emails and non-phishing emails using email structure, stylometric features, and so on. It took quite a time. I worked on feature extraction, data cleaning, dimensionality reduction, cross-validation, and model building. explored different evaluation methods too. I haven't pushed this on my GitHub either but I will soon. I don’t think I can make a post about it though.
10. Open Source Contribution
There are still some more projects to talk about but the number 10 project will be to contribute to open source. I learned unit testing, git and so much more through open source. It is something I don't do often because I always have little jobs that keep me so occupied but once I have a full-time job, I will definitely become a regular contributor. There is so much to learn and open source is one of the fastest ways to learn them.
Now we have come to the end of today’s post. Hit like if you enjoy reading, let me know your thoughts and if you will be trying any out let me know too! If you want more project suggestions, Let me know. I will make another post about a few.