datasciY.com

This page provides summaries and links to my coding projects and exercies.

- Author: Jennifer Yoon
- Contact email: "datasciY.info@gmail.com"
- Resume: data science-PDF
- GitHub repository: datasciY-repo

Sometimes it is really helpful to host a short, executable python code on a public web server. You may have a client with whom you wish to share an idea or a methodology. It may not be enough to show a static html page. Amazon's Lambda makes that easy to do. You can customize python library access with "layers." I will post a demo using the Ubuntu Linux OS base machine and a Jupyter notebook running Python 3, and import numpy, pandas, matplotlib, and a few other popular data science libraries.

Dash is an interactive charting app for the web using Python. No JavaScript required. Python developers can use many of the chart styles in their default mode to create beautiful, interactive charts. Website visitors can zoom in or out of the chart, seeing detail or summary views. Full customization is available via Plotly's open-source GitHub repo. I will be posting demos using financial and energy trading data.

I will post exercises using NumPy and Pandas to manipulate input data. Then, I will use MatPlotLib and Seaborn to visually explore those data. File read/write for JSON, CSV, and Excel formats will be covered.

- Numpy Exercise 1 (html), GitHub
- NumPy Exercise 2 (py), GitHub

A picture showing conceptually the bias-variance tradeoff in machine learning. Source: Pierian Data, Udemy class PyMLDSBC, Section 16 link. Udemy class link.

Passenger information from the Titanic ship is a common data set used in machine learning (ML). Here I use Python and data science libraries to find patterns in the data and build a prediction model. Then I use various visualization libraries to create pretty figures.

- View html version of Jupyter notebook: Titanic-NB-HTML
- Download from GitHub, full Jupyter notebook: GitHub Titanic-NB
- Tags: exploratory data analysis (EDA), machine learning (ML), graphics, logistic regression
- Data: https://www.kaggle.com/c/titanic/data: Kaggle Titanic data.
- Reference: Rossant, Cyrille,
*Ipython Interactive Computing and Visualization Cookbook*, 2nd ed., Packt Publishing 2018, pp. 299-304.

Algorithm efficiency is studied using the Big-O math. Generally an oder of log(n) is preferred over an order of n*log(n), n**3, or n!. The best algorithm has an order of n, **O(n)**, but this is rarely achieved. An O(n) means that as the number of inputs grows, the time to execute grows linearly. In my sorting algorithm, I use a binary tree with a central pivot point and recursive function calls to itself. I use this algorithm to study Big-O math. Example to post.

- Functions - Pass functions as inputs to another function. Function Exercise 1
- Functions - *arg, **kwargs, defaults, and variables order. To do.

Reference: www.geeksforgeeks.org" - Class Objects - Spaceship Class, Asteroid Class. To do.

I think one of the most confusing and difficult part of learning probability measure theory comes at the very beginning! Obviously this project is going to be very opinionated. :-) DeMorgan's Laws and other rules for calculating probabilities, which comes after the beginning, are not that different from normal algebra. I think most people can follow along and understand the other parts, if they do not make the mistake of getting forever stuck on the starting definitions! We need to rename "probability space", "sigma-algebra", and all those greek letters, to something more English-like and easier to remember. Anyway, I will be posting a very opinionated translation from Greek-Math-speak to Normal-English-speak. Stay tuned. :-)