Projects

datasciY.com

Introduction

Welcome to my data science projects portfolio.

This projects portfolio is very much a work in progress. When the portfolio is full, my goal is to cover the full spectrum of data science process while using Python, MySQL database, Excel-VBA, Amazon Web Services (AWS) and Google CoLab (GPU-engine for deep-learning).

My main area of interest is in applying data science tools to bring value to the financial derivative securities industry, the financial risk management industry and the economic policy setting industry. I am also interested in visual deep learning as it is applied to brain segmentation image analysis (e.g., Janelia.org) and to geospatial intelligence analysis (e.g., NGA.org).

Some of the specific types of analyses I will be performing include: decision trees and random forests, principal component analysis (PCA), k-means clustering, sentiment analysis, natural language processing (NLP), linear regression, logistic regression, time-series, econometrics, big data cloud computing, deep-learning and convolutional neural network (CNN).

Visualizing Interactive Charts with Dash and Plotly

Dash is an interactive charting app for the web that can be built using Python. No JavaScript required. Dash is built on top of the Plotly chart definitions. Python developers can use many of Plotly's chart styles in their default mode to create beautiful, interactive charts. Website visitors can zoom in or out of the chart, seeing details or a summary view. Full customization is available via Plotly's open-source GitHub repo.

Dash allows you to build a web app with your customized sliders, radio buttons, text input, and user-selected data sorting and filtering. While Plotly has built-in default chart types with zoom, pan, expand/collapse and data filtering already included.

More to follow.

Data Cleaning and Exploring with Numpy and Pandas

I will post exercises using NumPy and Pandas to clean and explore input data. I will cover reading from JSON, CSV and Excel file formats. I will also cover scraping data directly from websites in html formatted tables or PDF formatted tables.

Titanic Project (ML) DRAFT

Passenger information from the Titanic ship is a common data set used in machine learning (ML). Here I use Python and data science libraries to find patterns in the data and build a prediction model. Then I use various visualization libraries to create pretty figures.

To be continued later.

Bias-Variance Tradeoff at a Glance

A picture showing conceptually the bias-variance tradeoff in machine learning.

bias-variance tradeoff

A test result with a bias problem refers to a case where the true mean was totally missed by the machine learning model. See bottom-left target in image above. A test result with a variance problem refers to a case where the machine learning predictions are too widely distributed to provide a meaningful indicator to the decision maker. See top-right target. In most modeling situations, there is a tradeoff between hitting the true mean and reducing the variability around that true mean. Generally, it is not possible to maximize both. See top-left target. But it is possible to achieve poor results in both parameters from a poor model parameter selection. See bottom-right target.

Source: Pierian Data, Udemy.com., Python Machine Learning Data Science Boot Camp