Planet London Python

January 10, 2019

Ian Ozsvald

“discover feature relationships” – new EDA tool

I’ve built a new Exploratory Data Analysis tool, I used it in a few presentations last year with the code on github and have now (finally) published it to PyPI.

The goal is to quickly check in a DataFrame using machine learning (sklearn’s Random Forests) if any column predicts any other column. I’m interested in the question “what relationships exist in my data” – particularly if I’m working in an unknown domain and on new data. I’ve used this on client projects during the discovery phase to learn more about the sort of questions I should ask a client.

The GitHub Readme includes a screenshot which will give you an idea using the Titanic classification and Boston regression examples.

This is a very light project at the moment, I think the idea has value, I’m very open to feedback.


Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.

The post “discover feature relationships” – new EDA tool appeared first on Entrepreneurial Geekiness.

by Ian at January 10, 2019 08:00 PM

January 07, 2019

Ian Ozsvald

Looking back on 2018, looking to 2019

So last year was a damned hard year – ignoring Brexit and other international foolishness, on a personal level (without going in to details) by mid-year I was emotionally wiped out. A collection of health issues between family and friends kept rearing their ugly heads and over time I ran very low of emotionally supportive energy to share. Our not-so-old cat suddenly dying of kidney failure just about topped the year off. Thankfully by Christmas most of the health issues had sorted themselves out, massively reducing their induced stress.

From August to December I worked deliberately at a much lighter level to give myself time to recuperate, that’s paid off well and by Christmas I could consider myself “reasonably back to my old self”. Sometimes it pays to just be kind to ourselves.

This led to the odd situation later in the year when I was given the NumFOCUS Community Service Award – I had to accept it with a bit of a wry grin as I’d already stepped back from many organisational roles in PyDataLondon by this point. The lovely outcome of stepping back was that…nothing really changed for PyDataLondon. I’m immensely proud of the organising team we’ve built, everything just kept ticking along nicely. I’m now back to being more involved and I’m happy to say we’ve got so many suggested talks coming through that we’re scheduled now for a chunk of the year ahead.

The continued growth in our PyDataLondon community (with 8,500+ members – AFAIK we’re the largest data science event in the UK) and the wider PyData community (over 127 international PyData communities) is lovely to see. I helped open the PyDataPrague meetup a few months back and was happy to share some of our lessons from growing our London community.

I’m also very happy to see the PyData conferences experiment with more non-traditional sessions. At PyDataLondon 2018 we’d added a creche and ran sprints and workshops like “making your first open source contribution” and “understanding how git works” to help attendees get more involved in our open source ecosystem. Last year we had art and political hackathons and a woman-focused lunch. At PyDataAmsterdam we ran some similar experiments and I know others were tried at other events. This year I’m looking forward to seeing even more experiments, we’ll certainly run more at PyDataLondon 2019 (July 12-14).

Out of all of this there are a few things I’m particularly proud of:

  • We raised £91,000 for NumFOCUS from our volunteered efforts in PyDataLondon 2018 towards grants and work to support open source
  • We saw the opening of 6 regional PyData events in the UK (by recency: Oxford, Cambridge, Manchester, Edinburgh, Bristol, Cardiff)
  • I got to speak on ways of tackling new data science projects, high performance and how NumFOCUS works at a variety of international events
  • Via the “making your first open source contribution” sessions I ran I helped several groups of people start to contribute on github to Python projects

Whilst 2018 have some tough components, I’m really happy with the positive events that occurred.

Separately from all of this Chris and I have started to shut down ModelInsight after 5 years of collaboration. We only lightly worked on our consultancy in 2017 and we hardly touched it in 2018. The market for the combination of data science and data engineering that we were interested in exploring never grew, we had a lot of fun with our clients but it didn’t feel like we were taking the business anywhere special. Shutting this down was the right call.

I continue with my usual activities under my own name. In a few weeks I run a new course on Successfully Delivering Data Science Projects, I have other training planned, I’ve started to author on-line videos for Pluralsight and I continue to coach teams as Interim Chief Data Scientist whilst my jobs list continues to help companies recruit and folk get new jobs.

I’m also trying a few personal-focus experiments. From Christmas I put in a time limit on my Android for a maximum of 5 minutes daily on Twitter and 10 minutes daily on Reddit. I’ve also blocked The Independent (my preferred news site) in Firefox to reduce my time-wasting habits. I’ve set aside a day for personal development (I have such a pile of interesting math & data science stuff I want to read). Ask me in a few months how this is all turning out.


Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.

The post Looking back on 2018, looking to 2019 appeared first on Entrepreneurial Geekiness.

by Ian at January 07, 2019 02:54 PM