To our main yield curve dataset, we are adding at least 10 additional datasets. We are combining the datasets based on the date format from our main dataset. During this process of combining data there were a few problems we encountered. One problem was finding data that went far back enough in time so we can preserve the main dataset with the new data we are adding. Another problem we faced was figuring out how to extend data beyond its time frame, so we can have sufficient data points. We have yet to run any test, because our dataset is not done. We are close to finishing collecting our data, cleaning, and tidying the dataset. Our next steps, following those will be to run a lot of tests, build our model, and get this project done.
Our original dataset is data about the Yield Curve that we found from Kaggle. From there we added on Real GDP, Unemployment Rate, a Recession Indicator, Business Confidence Index, Consumer Confidence Index, Copper Price, Industrial Production, Real Median Household Income, Net Export, and Health Expenditure . From these datasets, we extracted the variable that the dataset is named after (ie our variable “rgdp”, or real GDP, is extracted from the “Real GDP” dataset).
Original Datasets:
Added Datasets: