All ten tutorials from SciPy 2020 are available now on YouTube. My top tutorial from the conference is PySAL. I have been trying to use this group of packages for studying the geography of socio-economic inequality. PySAL works along with geopandas and geosnap. I may have finally understood enough of the material to start to make progress with it. Other tutorials of note are Bayesian Statistics, PyTorch from Scratch, and Dask for easy parallelism.
I didn't like the numpy tutorial with a matplotlib intro. (See numpy tutorial from SciPy 2020.) It may be fine for some people, but I found it very distracting. Instructor chose to use an ipython shell and directly type code in it rather than a pre-populated Jupyter notebook. There is a PDF file, with 2 slides per page in portrait view. The PDF would have been easier to navigate on-screen if it was saved 1 slide per page in landscape view. I had to constantly scroll and zoom in & out on my screen to read the PDF, while also having a shell open in 2nd window and the live video open on 3rd window. I didn't like the teaching format. Content is very good for beginners. The PDF is good for self-study outside of the live tutorial format.
SciPy 2020 Keynote, First Image of a Black Hole by Event Horizon Telescope (EHT) Team
Date: July 18, 2020
Keynote EHT team video. Main talk starts at 18:00 minutes by Dr. Andrew Chael.
I am writing this post in the middle of this tutorial. SciPy virtual conference is going great! :-D Amazing how well it is going since it's the first virtual conference during the time of Covid-19 Pandemic. YouTube will have the corresponding video open to the public, maybe in a few days. There are 3 pre-recorded sets of videos already available from Enthought on YouTube. More playlists on biology will be released tonight at 6PM ET. Enjoy! :-D
Did the Big Bang never happen?
LLP Fusion YouTube videos I had written about earlier may be from a crackpot theory. I have moved the discussion to the General Page.
(See moved LLP Fusion.)
A good, recent interview of Andrew Ng. He is a co-founder of Coursera.org. He has taught DeepLearning.AI classes on Coursera during the past several years. His interview comment that he likes to take handwritten notes to learn was interesting. He summarizes what he is listening to rather than writing everything down verbatim. This helps him slow down and actively use his mind to make each concept concrete. Interview was posted on YouTube on Feb 20, 2020.
A Good Book about the daily life of a software developer
There are many YouTube videos out there that say you can become a software developer or a data scientist with a concentrated self-study in 9 months. There are also many boot camps that promise a $100,000+ coding job after finishing their 6-9 months program that costs $20,000 - $30,000. How realistic is the 6-9 months self-study plan or boot camp program? And even after you did all that work and paid the money and got the job, how do you know if you will be happy with your decision?
That's where this book comes in. YouTube videos can't provide the kind of deep, fact-filled analysis that a full-length book can. The author uses his own experience from working at various programming jobs after graduating from CMU with an engineering degree. He also uses the experience of his college friends and work friends to tell a more holistic story. I listened to the entire book over 2 days. I found it very helpful. While I am focused on a data science career, which is somewhat different from software development, I found it easy to apply the book's lessons to my situation. If you are contemplating a career in software or related fields, I highly recommend this book, Software Developer Life by David Xiang.
OK, now this is my own summary and opinion after having watched many, many YouTube videos on getting a job in software or data science. If you are most interested in getting a $100,000+ job in the shortest time possible, and you have a non-technical background (i.e., you don't have an engineering, statistics, math, or finance degree, a computer science bachelors or masters, or a PhD in math or physics), then your best chance comes from moving to Silicon Valley and going to a well-known boot camp for a Front-End Web Developer. Next using your newly acquired Silicon Valley network, send out job applications to large companies with plenty of money, e.g., Uber, Netflix, YouTube (Google), Facebook, and Apple. Your success rate will be 50% as of 2018 boot camp graduating class and declining. But this is still the best success rate available for non-technical people willing to study really hard during the boot camp and hustle like mad afterwards to land that first job. And if you get hired, you will make $100,000 to $115,000 as your starting salary. This option may not be possible for many people. The living cost in Palo Alto or Mountain View is astronomical. It REALLY helps if you can sleep on someone's couch for free. A tent on someone's backyard costs $1,000 to $2,000 a month to rent. Oh, and the average job search time post boot camp seems to be 4-6 months (for the 50% who were successful), and you will need to continue to support yourself during that time.
For those with a PhD in Math or Physics, they seem to go for a Data Science position. It is highest paying without moving to Silicon Valley. They seem to take about a 1 to 1.5 year for study and job search with about 6-9 months of it full-time after quitting their job. Their starting salary seems to be around $90,000 to $125,000 and is primarily dependent on location. Big cities pay more. There is no information on the number of people with PhDs who failed to land a job as a Data Scientist after putting in the effort. These are highly intelligent and motivated people and most of them already have jobs. So I would guess the successful transition rate will be lower than for the boot camp students who are all-in. Maybe 30% is successful?
For someone between the two above options, there is the low-paying or slow route. Many people with no technical degrees have successfully transitioned into Front-End Developer jobs after 1 year of full-time study, mostly via some form of formal schooling. Starting salary outside of Silicon Valley ranges from $35,000 to $50,000. Big cities pay more here too. If you want to become a Data Scientist and you have not programmed before, it will take longer. I am guessing about 2 years of full-time study to become competent in standard Python and Python data science libraries and machine learning concepts and related math. If you have not taken classes in calculus, linear algebra, and probability, you may need to add 6 months. By that time you will know enough to build a data science portfolio, which may take another 6 months. Job search will also take about 6 months, but some of this can be done concurrently while building up your portfolio.
Update - My Learning Experience
Starting in early June 2020, I began to feel confident about my basic Python and machine learning skills. Finally, I felt I had reached a minimum viable skill level to be conversant in this new domain. I am not an expert by any means, but I can start to be useful in a relevant job. In detail, this means I was beginning to feel confident about using the Python standard library, NumPy, Pandas, Matplotlib, Conda virtual environment, Ubuntu bash commands (WSL), Jupyter notebook (module and data imports, !shell commands, %magic commands, selecting different Python kernels), and Github and Git commands. I also had built-up some practice with scikit-learn, tensorflow, pytorch, machine learning and deep learning concepts, and related math and probability.
Time invested was about 2 years of focused study at 8-12 hours per day for 5-6 days per week. Calendar time elapsed was about 3 years from the beginning of my focused study. I had two 6-month periods when I could not study because of family or medical problems that required all of my attention. There was also a 1-year transition period, from when I set a goal for myself of studying some programming stuff on a more regular basis, to when I had narrowed my goal to Python and Machine Learning and was studying full-time.
As of late July 2020, I believe I will be ready to start a job in Data Science in 4-5 months, reserved for portfolio preparation and job search. If I am successful, then I would have transitioned into a Data Science career in about 2.5 years of focused study, and about 3.5 - 4.5 years of calendar time. This time commitment is comparable to getting a college degree, from a 2-year associate to a 4-year bachelor's degree, followed by 6-months of job search. Dear Reader, your own milage may vary. Use my experience as one observation point on how long it may take to transition into a Data Scientist job.
I also heard about a new Bayesian approach to Statistical Inference that
sounded interesting. There is a free class taught by Professor Richard McElreath, Statistical Rethinking: A Bayesian Course with Examples in R and Stan. I had a friend who was doing his PhD on the Bayesian Approach to Statistics. What I remember is that this approach makes a richer use of prior knowledge and estimates and finds ways to use them directly in the statistical model. Professor McElreath emphasizes that being able to reject a null hypothesis does not necessarily lead to our goal of positively accepting our research hypothesis.
"Rethinking: Is null hypothesis significance testing (NHST) falsificationist? NHST is often identified with the falsificationist, or Popperian, philosophy of science. However, usually NHST is used to falsify a null hypothesis, not the actual research hypothesis. So the falsification is being done to something other than the explanatory model. This seems the reverse from Karl Popper’s philosophy." (See Statistical Rethinking book, page 5.)
While I was at the NYC PyData Conference, I discussed a book for using
Machine Learning to make money in 2019. The book is called Advances in Financial Machine Learning, by Marcos Lopez de Prado, c 2018 from Wiley.
It is a very opinionated book. And I don't agree with many of the author's views.
But it offers an interesting look inside the mind of a hedge fund manager trying to use Machine Learning to make money in 2018. The author pans natural language processing to conduct sentiment analysis on earnings calls or satellite image processing to obtain product delivery or sales quantity estimates. The author thinks the low hanging fruit from those methods are gone. He believes in processing raw trading data from the exchanges to discover trading fingerprints of humans vs algo traders, as well as other classes of traders. This can be used to create spread trading strategies where one class of traders can be expected to outperform another class of traders, under specific economic conditions. The Machine Learning part is used to automate raw data processing, where huge volumes are processed for the small nuggets of silver.
The code used in book examples seem to be Python, but without PEP8 styling. A group of people have tried to translate the author's code examples into fully finished coding exercises. See Github link above.
Update on Long-Term Capital failure:
Originally in my chat with other attendees at PyData NY where I discussed this book, I was also explaining my take on the Long-Term Capital failure, and why I thought that they failed in an unusual way. Since then, I already got 2 posts from people commenting on the Slack channel.
In brief, I read in an article that Fisher Black told someone close that the reason he decided not to join Long-Term Capital was because he thought their strategy boiled down to shorting liquidity. I have no way to verify whether he said this. However, after many years of thinking about it, I came to agree that LT Capital failed primarily because they were short liquidity. This is unusual. Most failures are primarily due to market risk or credit risk. Although almost all failures do have a liquidity risk component, this is a short-term effect caused by deteriorating asset values. In most failures, the main invested assets are later discovered to be fundamentally flawed and loses significant value. This did not happen in the LT Capital failure. The fund ran out of time, but the bulk of the positions were later sold at a profit. I was working at the SEC when LT Capital failed and was involved in the wrap up. I think we can only have an imperfect understanding of what happened then, even though this case has been extensively studied and reported on.
Incidentally, you may also be interested in the book, When Genius Failed: The Rise and Fall of Long-Term Capital Management.Amazon link.
Woohoo! It's Octoberfest for Hackers again. Register and submit 4 pull requests.
Get started on creating a habit of frequent Github commits. The first 50,000
participants to finish gets a free T-shirt from Digital Ocean. Last year, I got reacquainted with Github through this Hactoberfest and a little help from NOVA Women Who Code.
Update: I finished my four pull requests on October 25, 2019, and was able to get my Hactoberfest 2019 t-shirt from Digital Ocean.
This year about 60,000 people finished the challenge, and only the first 50,000 got t-shirts. So I was lucky to get one. (My t-shirt arrived yesterday, Nov 14, 2019, yeah!!!)
I also submitted a pull-request to geosnap, which is a neighborhood economic analysis package. I had more trouble with this repo because I am unfamiliar with the code base. See Github geosnap repo.
Understanding Convolutions - Otavio Good's talk on Word Lens
Date: August 19, 2019
Otavio Good's talk on Word Lens at O'Reilly AI conference, Sept. 2017
While studying deep learning with fast.ai, I came across a really good video that demonstrates a convolution in action. In a convolution layer, a small grid crawls across the source image to produce an output image layer that is a combination of a source and a filter array. Each small output area is a dot product of a small scanned input image area and the filter. The filter (also called kernal) is most often a 3 x 3 array of numbers. These frequently represent vertical, horizontal, or diagonal filters. These filters themselves are products of previous machine learning steps. Watch Otavio Good demonstrate how a convolution layer recognizes the letter "A."
Google acquired Word Lens app and the development team in May 2014. It's now part of Google Translate. It can translate written signs and text (not hand writing) using your phone's camera in real-time. It's really handy when travelling and trying to make sense of foreign language public signs on the street and at museums. ;-)
My Books in August 2019
Date: August 15, 2019
This is the current state of my coding bookshelf. On the top shelf there are financial coding, Python & R machine learning, coding interview, algorithms, and financial modeling books. Bottom shelf has references for R and C++ coding, financial risk management, and statistics, probability, and stochastic calculus. I like to study from several different books on the same topic. I find that different authors have varying approaches, and they work best in combination. Jake VanderPlas's Python Data Science Handbook (c 2017) is still my best book for learning Python data science libraries. It's my go to book for Numpy, Matplotlib, Scikit-Learn, and Jupyter Notebook (for %magic and !shell commands).
Scientists estimate the time to a working commercial quantum computer at 10 years to maybe never. Error correction needs of qubits pose unknown challenges. A free downloadable study on the state of quantum computing is available from The National Academies Press.
Easy Explanation on How A Quantum Computer Works
Date: March 20, 2019
This is an old video dated 2013, but has an easy to understand explanation on how a quantum computer works. 2^n is the number of information bits that can theoretically be combined. 2^300 is supposed to be a greater number than the number of [atoms] in the universe. But this is only useful for calculations that can make use of the super-position state. Also, for reading the final result, the quantum computer must drop back out of the super-position state into the normal state. For normal calculations, the quantum computer is projected to be slower than a regular computer.
Beginning of Probability Measure Theory
Date: March 20, 2019
I think one of the most confusing and difficult part of learning probability measure theory comes at the very beginning! Obviously this project is going to be very opinionated. :-) DeMorgan's Laws and other rules for calculating probabilities, which comes after the beginning, are not that different from normal algebra. I think most people can follow along and understand the other parts, if they do not make the mistake of getting forever stuck on the starting definitions! We need to rename "probability space", "sigma-algebra", and all those greek letters, to something more English-like and easier to remember. Anyway, I plan to post a very opinionated translation from Greek-Math-speak to Normal-English-speak.
Next Meetup: Convolutional Neural Networks for Visual Recognition, by Stanford University, Chp 1 and 2. CS231n
Reference: Gareth James et al., An Intro to Stat Learn with R. ISLR-website
I gave my first short talk on a data science subject to a local Meetup group this week. Here's a shout-out to the group,
Serious Data Science. Thanks Deborah, Julius, Elsa, Peter, Dan and others. You guys are so supportive and kind! I don't think I would have read the ISLR book with such attention without all of you helping to keep my motivation high! :-) If you, Reader, live near Sterling, Virginia, please come and join this wonderful Meetup group. We meet monthly on the 2nd Tuesday evenings at REI Systems Inc building.
GARP 20th Conference in NYC
Date: February 24 - 27, 2019
I will be in NYC attending the 20th GARP Risk Conference. The agenda has several sessions on machine learning and AI along with the usual risk topics. I am interested in learning more about how data science and AI is being using by financial institutions. I will also catch up with my former colleagues from the SEC while I am there. Glad the scheduling worked out.
PyData DC 2018 and SciPy Austin 2018
Date: November 20, 2018
Attended the PyData DC 2018 conference in Tysons Corner, VA over the weekend. I thoroughly enjoyed it. Everybody was very nice and welcoming towards relatively new programmers, like myself. I will post a write-up about several talks/software that caught my attention. This conference was more accessible for me than SciPy in July 2018 at Austin, TX. I come from a business background and have been learning Python and Data Science for only about 1.5 years. Many of the people I talked to at PyData had similar backgrounds. The SciPy community was more deeply into core python package development and were more advanced programmers. The majority seemed to have PhDs in a hard science or math field. For me personally, the learning experience was higher from the SciPy conference in a "tough love" way. But I felt more of a sense of belonging and was happier at the PyData conference. I will also have a writeup of a couple of the tools/talks I found most useful from the SciPy 2018 conference.
DevEnv for Windows - Elegant-SciPy book:
I agreed to help Juan write a Windows OS version of "build" instructions for converting Markdown format files on GitHub to Jupyter notebooks, which are then saved as html or pdf book chapters, with or without output calculation cells. I tested several versions so far using conda virtual environment, partial bash tools for Windows, and the new Microsoft Windows Subsystem for Linux (for fully compatible bash scripts).
Note on Jupyter notebooks: MikTex package needs to be installed at "C:/Program Files" and the Windows environment variable, system path needs to be set to this directory. MikTex allows LaTex and some markdown formatting codes to work for saving Jupyter notebooks to html and pdf formats.
Proposed talk to Risk Managers:
Part 1) A quick overview of cool talks from SciPy and PyData conferences.
Part 2) A hands-on practical demos on the most useful AWS tools.
How to host your website on Amazon Route 53
How to run a Python program on Amazon Lambda
How to run a deep learning project on Amazon EC2
(Elastic Compute Cloud)
How to store your files on Amazon S3 (Simple Storage Service)
Bonus - how to share your project on GitHub, and how to find other people's projects.
Refactoring previous code to share on my portfolio
Random Walk charting demo
A sorting algorithm and analysis using Big-O
Game 2048 slider, using my custom images and class objects.
Python IO demo. Writing text files for controlling lab equipment settings.
Uploading previous R code for matrix calculation and data analysis