Durban Data Science

Durban Data Science

Return to main page.

Previous Meetings

7 April 2016

Monthly Review: What's New?Ryan5
Notes on recent developments in Data Science.
Data Manipulation with dplyrBrian30
dplyr is a powerful package for data munging. Brian will be taking it through its paces and showing us the highlights. He uses dplyr routinely as part of his workflow.
Predicting Marathon TimesAndrew10
Marathon times are traditionally predicted using Riegel's Formula. This provides remarkably accurate results, however, they come without any indication of uncertainty. Andrew will be talking about an alternative model (and Shiny application) that he has developed for predicting marathon times.

3 March 2016

Monthly Review: What's New?Ryan5
Notes on recent developments in Data Science.
High performance with Microsoft R Open & R ServerAndrie de Vries20
Andrie de Vries (Microsoft) will be talking about various enhancements that Microsoft has made to R to make it more performant. For example, Microsoft R Open includes Intel MKL for fast linear algebra and matrix computation. R Open is available as an Open Source project and free of charge. Andrie will demonstrate the enhancements in R Server to deal with objects that are much larger than available memory. R Server comes with compute contexts for SQL Server as well as Hadoop. Students and developers can get a fully featured free download of R Server. Andrie will also demonstrate the forthcoming R Tools for Visual Studio, an IDE for R based on Microsoft Visual Studio.
Visualisation and Dashboards with TableauAndrew15
I've been messing around with a trial license for Tableau. It has some pretty cool features and I have made some visualisations that are bordering on artistic. I'll be giving a short demo.

4 February 2016

Monthly Review: What's New?Ryan5
Notes on recent developments in Data Science.
Shiny 101Dane15
An introduction to Shiny in R. A brief presentation aimed at encouraging people to develop simple web apps for special projects or business analysis purposes.
Using python for Data ScienceGary15
R isn't the only tool for Data Science! I will demonstrate the use of Python to build a simple decision tree and fit multilinear and logistic regression models.
Collaborative FilteringAndrew20
How to leverage user-based and item-based collaborative filtering to generate a recommendation engine. I'll explain the ideas with a simple example and then look at how it can be used to generate intelligent game recommendations.

11 January 2016

Monthly Review: What's New?Ryan5
Notes on recent developments in Data Science.
Feedback from Kaggle TeamsVarious10
Progress made on Kaggle entries in the last month.
Data Science at DerivcoAndrew Slabbert15
Thoughts on how Data Science can be applied at Derivco.
Linear Models: Background and an ApplicationAndrew10
Linear regression is a very simple of model, but can still produce powerful results. In this talk I will briefly review the process of fitting a linear regression model in R and then talk about an application that we have been working on.
CARET: predictive modelling in REtienne15
Creating, implementing and training predictive models can at first seem quite daunting and can be quite time consuming, especially when performing this task from scratch. Fortunately, R provides a package called CARET (short for Classification And REgression Training) which can easily be used to train various machine learning algorithms such as Random Forests, Boosting, Logistic Regression etc. Moreover, CARET also provides routines for managing and structuring of sampling windows which is an important component related to data ingestion into a model during its training phase. This presentation will give a short overview of the package and give an example of an application to Churn.

2 December 2015

Monthly Review: What's New?Ryan5
Notes on Data Science.
Feedback from Kaggle TeamsVarious12
Progress made on Kaggle entries in the last month.
The Walmart ChallengeAndrew10
Some notes on how I am attacking the Walmart Challenge competition.
Improving Image Quality with Adaptive ThresholdingYasthil15
Image pre-processing, enhancement and segmentation are vital steps in the image processing pipeline. It’s rare to find a set of images that will produce good classification results without removing noise from the images. Noise can be in the form of shadows, bad lighting or blurred images. Shadows introduce a particularly difficult problem which can be fixed using Adaptive Thresholding.I will give a brief introduction to image Segmentation and Thresholding before focusing specifically on Adaptive Thresholding. I will illustrate the advantages of Adaptive Thresholding and show how it can be used to sharpen the outline of characters embedded in an image.
Yelp Data AnalysisGary15
My capstone assignment for the Coursera Data Science Specialisation, looking specifically at the tm package and combining models.

5 November 2015

Monthly Review: What's New?Andrew5
Notes on Data Science. Get the gist.
Feedback from Kaggle TeamsVarious12
Progress made on Kaggle entries in the last month.
Data Science from a Bus Apps Developer's PerspectiveNiels15
A "chalk and talk" about why I find Data Science interesting, and what it can bring to Business Applications.
Genetic Algorithms with PythonSuja15
An overview of Genetic Algorithms and how they can be implemented using the Distributed Evolutionary Algorithms in Python (DEAP) package.

1 October 2015

Monthly Review: What's New?Etienne5
Quick stories from the bleeding edge.
Data Science CompetitionsAndrew10
Data Science competitions are a great way to hone your skills. I'll be talking about where to find competitions and how to get involved.
SQL Server: Querying made (somewhat) easierChris15
A beginner level technical session with tips like: 1. Where can I find ...? 2. Three ways of writing multiple similar queries. 3. Looping in SQL. 4. How to play with SQL without doing anything permanent. 5. A few quick tips on making queries easier. Code snippets will be provided.
Data Discovery: Analysis Services and SQL ServerDuncan15
Enabling end-user Data Discovery on large data sets is often restricted by performance and size of data. I will demonstrate how you can design for generic data discovery on large amounts of data by leveraging the performance of Analysis Services and flexibility of SQL Server.
Discussion: Data Science CompetitionsAndrew5
Selection of competitions. Volunteers/election of team leaders.

3 September 2015

A D3 Cook Book: recipes for the lazyRyan20
A crash course in D3.js. Why everyone loves it, why you don’t want to use it and how c3 and nvd3 can be used to get fast, beautiful results.
Demystifying Game DesignCaroline20
Plans to use Machine Learning techniques to understand what makes a successful game.
Where’s the Science in Data Science?Andrew20
Done right, a scientific investigation should apply the "Scientific Method", which consists of a number of principles which are applied to ensure (or improve) the validity of the results. Are these principles being applied in Data Science? If not, how can we make Data Science more rigorously scientific?