The struggle continues!
Despite my previously posted hope that I’d be trading soon (at the very least, paper trading), things have continued to pop up that need dealing with. At first the issue was that I kept adding or changing things to make them better and more efficient. As I was adding more and more things, or changing more and more, I found myself getting further and further away from having a “working” program. The irony was, the core of it, the code that analyzed and “traded” the stock, that was basically done and functioning; the other things I was working on were, even if they were improvements, not necessary. So after taking a couple days off completely from coding, I decided to drop everything, pick up where I was a couple weeks prior, and get it to the finish line. What I ultimately needed was to take my core functionality and make it so it would apply it automatically to the entire stock market, not individual stocks I requested manually.
So with that new goal in mind and new determination, I set forth. And that’s when I started finding more problems.
The first problem I had was coming up with a scoring system; the end goal of my program is to analyze the entire market, and then spit out the stocks that were the best buys on a given day (or which ones to sell if I was holding). To achieve that, I needed to come up with some way to score the stocks so that they could be compared to each other; it can’t simply be a matter of which one made the most. It had to compare which ones had better returns on individual trades, which ones needed fewer trades to achieve the best overall returns, etc. It took quite a while to come up with something I felt was satisfactory and effective.
However, once implemented, I found jarring problems. Stocks that were almost annihilated scoring as well as stocks that barely moved, or stocks that had gone up a decent amount scoring worse than those that lost half their starting amount, for example. For once it wasn’t a coding problem, it was a methodology problem, the way I was generating the scores. I wanted to keep it as simple as possible, but it was ineffective. So I spent a few more days working on a scoring system, trying out different things, running tests on my server box. Eventually I came up with something that seemed reasonable, and would fix errors as they would appear, fine tuning things, applying conditional math, etc.
This in turn, unfortunately, led to a discovery that has sidetracked the project a little bit. My data source for EOD pricing is actually somewhat known on various algo and tech analysis sites as being unreliable, I just didn’t know to what extent. After all, it’s difficult to audit 22,000+ individual stocks, on a day to day basis, to confirm whether they’re accurate or not. But after running my scoring system across the whole U.S. market I saw a lot of problems that were the result of poor quality data, such as missing or incorrect pricing, or more troublesome for me, no records for inactive days. Most sites, even Yahoo! Financial, will clearly indicate inactive days by having a 0 (zero) or null value for the volume, and use the last active day’s closing price for inactive days’ open, high, low and closing values. But this source, they just didn’t bother to include those days in their files at all; hundreds of stocks with gaps large and small in their historical data, rendering my entire project meaningless and inaccurate.
I can’t be too surprised, as I said this has been known among other sites and forums, and the data was free as well (although you’d have to assume that the data is the same that the paid subscribers get, along with other paid benefits). At first, I was attempting to create artificial entries myself, filling in the gaps, but while doing so I found other, less critical errors, such as stocks changing from one exchange to another, or ticker symbols being reused by different companies; my source didn’t provide any of this data under their free tier. So I thought I’m going to have to bite the bullet and sign up for a paid service (elsewhere, of course) and have been looking around for the past week or so.
Fortunately, I’ve found a couple potential winners; one being Polygon.io, and another being Tiingo. Both of them have free tiers, which is great to try them out. And both have an abundance of extra data available, some of which I sorely need, such as dividend and split information (which was a massive undertaking on its own), or other data I have plans for, such as market/sector information and news summaries (by stock/company). The only drawback is that my previous source issued data by CSV files, and these two both offer via JSON; it’s turning out not to be too big a deal but I am having to rewrite a considerable amount of my code to work with it, and I’m even considering redoing most of my database to keep the new data clean and consistent, to avoid possible conflicts with older data from my original source. Both Polygon and Tiingo limit the number of calls you can make to their server on the free tier, but it’s high enough that it shouldn’t hold me back too much while I get back on track.