Coding Projects & Exercises

datasciY.com

Introduction

This page provides summaries and links to my coding projects and exercies.

Amazon Lambda (AWS)

Sometimes it is really helpful to host a short, executable python code on a public web server. You may have a client with whom you wish to share an idea or a methodology. It may not be enough to show a static html page. Amazon's Lambda makes that easy to do. You can customize python library access with "layers." I will post a demo using the Ubuntu Linux OS base machine and a Jupyter notebook running Python 3, and import numpy, pandas, matplotlib, and a few other popular data science libraries.

Visualization with Dash

Dash is an interactive charting app for the web using Python. No JavaScript required. Python developers can use many of the chart styles in their default mode to create beautiful, interactive charts. Website visitors can zoom in or out of the chart, seeing detail or summary views. Full customization is available via Plotly's open-source GitHub repo. I will be posting demos using financial and energy trading data.

Data Exploration with Numpy, Pandas and MatPlotLib

I will post exercises using NumPy and Pandas to manipulate input data. Then, I will use MatPlotLib and Seaborn to visually explore those data. File read/write for JSON, CSV, and Excel formats will be covered.

Bias-Variance Tradeoff at a Glance

A picture showing conceptually the bias-variance tradeoff in machine learning. Source: Pierian Data, Udemy class PyMLDSBC, Section 16 link. Udemy class link.

bias-variance tradeoff

Titanic Project (ML) DRAFT

Passenger information from the Titanic ship is a common data set used in machine learning (ML). Here I use Python and data science libraries to find patterns in the data and build a prediction model. Then I use various visualization libraries to create pretty figures.

Sorting, Recursion and Big-O Math

Algorithm efficiency is studied using the Big-O math. Generally an oder of log(n) is preferred over an order of n*log(n), n**3, or n!. The best algorithm has an order of n, O(n), but this is rarely achieved. An O(n) means that as the number of inputs grows, the time to execute grows linearly. In my sorting algorithm, I use a binary tree with a central pivot point and recursive function calls to itself. I use this algorithm to study Big-O math. Example to post.

Python Basics

Beginning of Probability Measure Theory

I think one of the most confusing and difficult part of learning probability measure theory comes at the very beginning! Obviously this project is going to be very opinionated. :-) DeMorgan's Laws and other rules for calculating probabilities, which comes after the beginning, are not that different from normal algebra. I think most people can follow along and understand the other parts, if they do not make the mistake of getting forever stuck on the starting definitions! We need to rename "probability space", "sigma-algebra", and all those greek letters, to something more English-like and easier to remember. Anyway, I will be posting a very opinionated translation from Greek-Math-speak to Normal-English-speak. Stay tuned. :-)