Getting Data from Yahoo

For questions and discussion related to reading in and working with data.

Getting Data from Yahoo

Unread postby PeterF » Thu Dec 17, 2020 10:52 am

Dear RATS users

Financial market data from Yahoo is used in some academic working papers or journal articles. It has two advantages, first it is distributed free of charge and second, the most important aspect, it contains a field for the adjusted close, which takes into account the impact of dividend payments, stock splits or rights issues. These data is superior compared to the unadjusted close for calculating daily yields, for example to estimate beta factors.
For other software, applied in quantitative analysis, like R or Python, there are packages available, which could retrieve the data from Yahoo directly into the program, which is very convenient in the case that the data has to be updated more frequently than it is necessary for an academic working paper. This could also be done with a procedure for RATS which is provided below in the code section and also attached as SRC-file.
The procedure allows to load directly one of the data fields Open, High, Low, Close, Adj close or Volume. In the @GetYahoo instruction just set the option fields to the capital letter of the aforementioned data fields. For example, to get the data for the adjusted close, the instruction would be @GetYahoo(fields=”A”). Also the combination OHLC and OHLCAV are valid strings for the fields option. The further parameters are the ticker symbol, the start date and the end date. The procedure calculates the corresponding values for start and end required in URL string to get the data in csv file format. Special characters in the ticker symbol have to be in the required format for a URL. For example, the S&P 500 has the ticker symbol ^SPX but is included in the URL as “%5ESPX”.
One problem is the space in the label for the adjusted close. Thus, the volume has the same label as the last price of the day. This requires a work around. The DATA instruction has the option left, which is used to retrieve the volume. However, in this case, RATS does not take care of the dates and missing data is ignored. Therefore, if the option is set field=”V”, first the adjusted close is retrieved including the NA for market holidays. Then the volume will be retrieved. Before the series is returned, a procedure INSNA is called. This procedure requires the following parameters, name of the series including the NA, name of the series excluding the NA, start and end. The procedure inserts the NA in the second series, and thus, the volume data corresponds to the right dates. For some analysis, the NA for missing data has to be eliminated, for example to calculate yields. Therefore, another procedure DELNA is also supplied. The parameters are the same as for the INSNA procedure.

The @GetYahoo instruction could be enhanced to allow for different data frequencies other than daily. Also a toggle to include or exclude the Adjusted close would be possible.

Code: Select all
*
* Copyright by QCR Quantitative Commodity Research Limited, 2020
* use is free of charge
* distribution is allwoed as long as this copyright section is fully incluedes
*


*
* Procedure to remove NA values in a series
*
PROCEDURE DELNA inseries outseries fstart fende
type series inseries
type series *outseries
type integer fstart
type integer fende

local series filter_missing


*
* remove the NA at market holidays from input series
*
set filter_missing = %valid(inseries)
sample(smpl=filter_missing) inseries fstart fende outseries
end

*
* Procedure to insert NA values in outseries based on inseries
*
PROCEDURE INSNA inseries outseries fstart fend
type series inseries
type series *outseries
type integer fstart
type integer fend

local series rawout
local series filter_missing
local integer i
local integer j

*
* restore the the NA at market holidays from input series
*
set filter_missing = %valid(inseries)
set rawout = outseries
*
* apply structure of inseries to outseries (NA at same dates)
*
comp j = fstart
do i=fstart, fend
   set outseries i i = %if(filter_missing(i)==1,rawout(j),%na)
   comp j = j+fix(filter_missing(i))
end do
end

*
* Procedure to retrieve price data from yahoo
*
Procedure GetYAHOO ticker fstart fend
type string ticker
type integer fstart
type integer fend
option string fields "A"


local integer d m y
local integer juliandiff startsec endsec
local string rooturl starting ending finstring

*
* Components of the URL
*
comp rooturl = "https://query1.finance.yahoo.com/v7/finance/download/"
comp starting = "?period1="
comp ending = "&period2="
comp finstring = "&interval=1d&events=history&includeAdjustedClose=true"

*
* convert the start and end date to a string
* yahoo uses seconds since January 1, 1970
* for getting quotes of the current day, set fende to the current day
* the program will add one days, thus seconds will be until the start of the next day
*
comp d = %day(fstart)
comp m = %month(fstart)
comp y = %year(fstart)
comp juliandiff = %julianfromymd(y,m,d) - %julianfromymd(1970,1,1)
comp startsec = juliandiff*24*3600
comp d = %day(fend+1)
comp m = %month(fend+1)
comp y = %year(fend+1)
comp juliandiff = %julianfromymd(y,m,d) - %julianfromymd(1970,1,1)
comp endsec = juliandiff*24*3600

*
* get the URL
*
comp URL = rooturl+ticker+starting+%string(startsec)+ending+%string(endsec)+finstring

*
* get the data
*
open data &URL
if fields == "O"
{
   data(format=cdf,org=col) fstart fend open
}
if fields == "H"
{
   data(format=cdf,org=col) fstart fend high
}
if fields == "L"
{
   data(format=cdf,org=col) fstart fend low
}
if fields == "C"
{
   data(format=cdf,org=col) fstart fend close
}
if fields == "A"
{
   data(format=cdf,org=col) fstart fend adj
}
if fields == "V"
{
   data(format=cdf,org=col) fstart fend adj
   data(format=cdf,org=col,left=7) fstart fend vol<<"Close"
   @insna adj vol fstart fend
}
if fields == "OHLCAV"
{
   data(format=cdf,org=col,nolabels,skiplines=1) fstart fend open high low close adj vol
}
if fields == "OHLC"
{
   data(format=cdf,org=col,nolabels,skiplines=1) fstart fend open high low close
}
end


Example for S&P 500
Code: Select all
*
* Example for the SPX 500
*

* set sourc instruction to Get Yahoo Data.src

CALENDAR(D) 2019:1:2

comp start  = 2019:1:2
comp ende    = 2020:12:16
comp ticker = "%5ESPX"
comp fields = "V"
@GetYahoo(fields="OHLCAV") ticker start ende


Best regards
PeterF
Attachments
Get Yahoo Data.RPF
Example for the S&P 500
(248 Bytes) Downloaded 192 times
Get Yahoo Data.SRC
Source file with procedures to retrieve Yahoo data
(3.16 KiB) Downloaded 192 times
PeterF
 
Posts: 45
Joined: Thu Apr 12, 2012 2:03 pm

Return to Data: Reading, Writing, Transforming

Who is online

Users browsing this forum: No registered users and 2 guests