Click to show TOC
You can explore the package and explore the some of the core caplbilities by clicking
Package Deep Dive
Here is another interesting example
The area below is an example of a code editor where you can try out the code examples. The editor, nor any part of this page send any data to the server. Everything you see is beeing executed on the browser that does not require a back end server.
Changes you make to the code in any of the exitors can be retained using the browser local storage, that is locally on the machine you are using.
 
This is a database like data manipulation library. It lets you do simple database SQL like operations on in-memory tabular data. The main component of the library is a Frame. This is a data structure that contains tabular data made up of of rows and columns. Each column has a name, and just like the columns of a database table the names must be unique. The tool has been tested with 1 million rows with good performance. The total amount of data is limited by the memory of your browser or node instance. All elements (cells/column) are stored as strings or numbers.
A Frame
That conversion is done by the frameFromBuffer
function. Please note thst the frame is aware that some numbers may be stored as strings, and most of the utilities will treat numbers in string format as if they were numbers. Strictly speaking you can store any any javascript primitive is a cell, but that has not been fully tested and so buyer be aware.
The operations include the ability to do the following:
gb.count
count the number of non-empty values is a groupgb.max
- maximun value of a column in a groupgb.min
- the minimum value of a column in a group gb.min
- the minimum value of a column in a group gb.mean
- the average value of a column in a group (ignoring empty values)gb.stdDev
- the standard deviation value of a column in a group (ignoring empty values)
The inspiration for data frame came from two sources, SQL and the powerful python pandas utility. Frame does not attempt to provide all the functionality of pandas or that of SQL. The examples provided here will give a taste of some of the things it can do.
we have to bring in the str-data-frame npm module. We have several choices, using javascript es6+ module symtax, using webpack or similar bundler, or just as a script. For rhe purposed of this tutorial I am using the good old script method. I creates in the global namesapce DataFrame variable.
Lets do a quick look a what is defined in the package. In this html page we use the following unpkg.com to load the str-data-frame package. The exported name from the Package is DataFrame.
<script src="https://unpkg.com/str-data-frame@0.2.18/dist/bundle.js"></script>
Get the names (keys) in the module using Object.keys(DataFrame)
map the list of names to name/value pair [name, value]
map the [name, value]
to [name
, type of value, display string for the value
]
result is put into data
Note: We use he two following helper function
Display function D(value)
function D(d) {
if(d === undefined ) return "<undef>";
if(typeof d === 'function') return "<func body>";
if(d instanceof Date) return d.toString;
if(typeof d === 'object') return JSON.stringify(d)
return d.toString();
}
function getType(d) {
if(d === undefined) return 'undefined'
if(Array.isArray(d)) return 'Array';
if(isClass(d)) return "class";
return typeof d;
}
So you are in a hurry and would like to see if Frames are yor you. If you have table like data, Data Frames are for you. Think of them as fast in-memory database tables.
We will use world covid-19 daily statistics, this is a csv file that we will split into an array of lines. We can use the built in fetch function to pull data from the public repository on Github. Fetch returns a promise. To learn more about Promise on the link. Once the promise resolves, the data will be available for use. As always last expression in the script will be displayed. If the value is a promise, TryITjs will wait for the promise to resolve and then display the result.
 
Since csvData is just a big string we can use the string.split
opertation to convert it to an array of lines, where each line is a string.
Since the CSV data is just a big string
, to converts it to a frame we have to do the following
csvLine(aLine)
will split a 'CSV' line string into an array of column values. This is also known as a line splitter, the following line splitter existcsvLine
- for CSV filestsvLine
- for tab seperated valuespsvLine
- '|' pipe seperated value. THis si a common format in healthcare The following code is a simplified version of what .frameFromString
does
//frameFromString is at its core the following code
var lines = csvData.split('\n').map(csvLine);
var covidFrame = new Frame(lines.slice(1), lines[0]);
Create a frame from string in the CSV file using the script below. Once we have the Frame we can show the first 20 rows using .slice
similar to slice on a Javascript Array, and as always the last expression is displayed in the results area.
We converted a string to a frames.
obj._toHtml()
and uses that to render the display. In the next section we will see how we can further improve Frame rendering by overriding the _toHtml
method. Although in the tutorial for Frame we will not be using this capability, but it is a useful thing to know for those creating their own TryITjs tutorials.
A Frame has a method ._toHml() to render a frame as HTML. This is a nice Table rendering module DataTable. Details for useing this capability can be found here: DataTable - a plug-in for the jQuery Javascript library.
More details about rendering a DataFrame (or any other object) to HTML:
We can customize the rendering as follows:
 
We will plot some data from the covid Frame for USA
Tryitjs does not have plotting out of the box, but capability was added with the following code.
Click to show the code
<script>
function Identity(x) { return x}
function genID(prefix) { return (prefix || 'T')+Math.round(Math.random()*100000); }
function capitalStr( phrase ) { return phrase.replace(/\b\w/g, c => c.toUpperCase());}
function addDiv(id, ext) {
let targetDivName = `#${id}_${ext}`;
let targetDiv = document.querySelector(targetDivName);
if(!targetDiv) {
targetDiv = document.createElement('div');
targetDiv.id = targetDivName.substr(1);
let sibling = document.querySelector('#'+id);
sibling.after(targetDiv);
}
else targetDiv.innerHTML = '';
return targetDiv;
}
function plotDiv() {
let dn = $$.executeDiv;
return addDiv(dn+'-display','plot'); // add a div for plotting after the display div
}
function HTML(data) {
if(data === undefined || typeof data._toHtml === 'function') return data;
return { _toHtml: () => data }
}
function multiSelect(opts,list) {
let {onchange, id} = opts || {};
id = id || genID();
let html = `
<div id="${id}" class="ui multiple selection dropdown">
<input name="thick" type="hidden">
<i class="dropdown icon"></i>
<div class="default text">100</div>
<div class="menu">
${list.map(([key,val]) => `<div class="item" data-value="${key}">${val}</div>`).join('\n')}
</div>
</div>
`;
onchange = onchange || Identity
$$.lastly(() => {
$('#'+id).dropdown();
if(opts.onchange){
$('#'+id).change( function(e) {
let list = $(e.currentTarget).dropdown("get values");
onchange(list,e);
} );
}
});
return HTML(html);
}
</script>
<script src='https://cdn.plot.ly/plotly-latest.min.js'></script>
<script>
function newPlot(dn, data, layout,config) {
if(typeof dn === 'string') {
Plotly.newPlot(dn, data, layout, config);
return dn;
} else {
let div = genID($$.executeDiv);
let [_data, _layout, _config] = [dn, data, layout]; // no dn parameter
$$.lastly(true,() => Plotly.newPlot(div, _data, _layout, _config));
return HTML(`<div><div id="${div}" /></div>`)
}
}
function simleHistogram(x, layout={}, dn=undefined ) {
var trace = {
x: x,
type: 'histogram',
};
var data = [trace];
newPlot(dn || 'myDiv', data);
}
</script>
In this section we will show how to:
100*total_deaths/total_cases
percent of people who die after contracting COVID-19Note we create a new frame, the riginal frame 'covidFrame' remains unchange. Frames should always be treated as immutable data. Modifying a frame may prevent it working correctly. It is highly recommended that you never manipulate the internal (instance) vatiable of a frame.
 
This section will demonstrate the groupBy capability of DataFrame to create summary of data.
Next we want to rollup the data into continents, for this we will use the group
the data by continent
This is not quite what we need, since this is just giving us the continent names, we also need
to keep the date
column
Next we want to aggrigate the total_cases
and total_deaths
, so we need to sum (add up) the
thos value for every country in a continent and also on that date. We have a operation for that
in the groupBy,
We saw earlier the original 'covidFrame' contains a large quantity of daily covid information for every country. A row in that frame represented a country and date and information related for that day, as well as some fixed (not date dependent) information about the counter. From the 'covidFrame' we create a new frame ('countries') selecting only the columns we need for the task at hand.
The we create a frame for each continent by aggrigation daily data for each country in a continent, (group by 'continent' and 'date'). Further, we created two computed columns for each continent ('total_death_per_million', 'total_cases_per_million'). Note: it is easier to recompute that for a continent rather than extract from the individual nation's column for the same.
In this section we will show how to create summary of the world codiv date to create summary information for countries and continents.
We saw earlier the original 'covidFrame' contains a large quantity of daily covid information for every country. A row in that frame represented a country and date and information related for that day, as well as some fixed (not date dependent) information about the counter. From the 'covidFrame' we create a new frame ('countries') selecting only the columns we need for the task at hand.
Next we
We are going to turn the operation above into a utility function that takes a country frame and and create a continent frame, where the entire continemt structurally looks like a single country as we demonstrated above. But the display can be misleading, since there som countries where the cases and deaths are so high that they skew the result for the other countries in the continent. One way of dealing with the problem is to peel away the country from its continents and to treat the country as a continent in its own right.
The easiest way to do that is to set the country
   
You will notice countries have a 'iso_code' and a 'location' namely the country (loca)
This is another roll up process where the dates are not important, only the largest calues of
total_cases
and total_deaths
We will use the continentSummary frame to build pie charts from the data. Again we will be using the powerful but simple to use Plotly libray.
Lets create a helper function to create a pie charts. This function will use the .rawColumn(colName) method to get a column from a frame as simple javascript array.
Just a note: _ rawColumn will get the values in the column in their format in the frame, if a column has numeric values but is string form, the will remain a string. Plotly will kindly convert strings to numbers, where numbers are required.
Now plot some pie charts. The countries with the most cases are removed from their respective continents and are treated as if they were their own continents. So for example North America excludes the United States.
 
This time we will rollup data into countries and continents
iso_code
and location
continent
columnmortality
this value is computed by dividing total_deaths
by total_cases
In the following section we will show you how to do that using the select
method of a frame.
The countries and continents are similar but not identical, so we cannot combine them into a single table.
iso_code
and location
columns.
To select a few columns from a frame (including new ones)
total_deaths/total_cases
- a computed value
We can create a new frame by concatinating two frames, as long at they hace the same number and order of the columns. In the previous script we created
countries
framecontinents
continents
frame has 'iso_code' and 'location' columnsNow that we have the countries frame and the continents frame in the same shape we can combile the into one table using the concat method
 
Plot the data using the beautiful plotly library. So first thing we need is a plottting function based on the plotly.js library. This library has to do the following:
 
Data Frames have a powerful aggrigation capability in frame.groupBy
, this capability is built on an similar capability in SQL, namely groupby
. The key idea here is to collect
into groups all rows that have the sane value in a column, or a set of columns. On each group we can the apply come aggrication operation on some of thr remaining columns.
This example uses React/jsx for inline rendering using components provided by react-semantic-ui
This section will demonstrate how to build some custom UI. The following use cases will be used.
In order to create the frame we need to do the following
countNonEmptyCells(columnName)
columnType(colName)
In sql we can do this using select columns from table, further the select operation can rename the columns, add new columns, and transform data is columns
We use a similar operation for a data frames using the project
frame.project([ list of columns])
[..., "location=country",...]
We also can reorder the columns by chosing the order of the columns, the example below shows how to create a new frem from a subset of the original columns and renaming a column
Note We can also add new columns by adding a new column name
In the covid table there is a great deal of data redendency and unnecessary data for example for each row has a iso_code
for each country, country name, and continent.
In database terminology we say that the data is not normalized. In this section we will show how we can normalize the data.
This section we will look at how to create the following:
iso_code
or a continent
iso_code
and continent to the value of location
Looking at the results notice the following:
We can fill those values with the value in location. We can do that will the code below:
 
Select the most important columns using Frame.select
method, in this prosess we can:
'location=country'
_rename location to country['INDEX', (value, row) => row.__index$]
-- create and index columnSo the following will create a ne Frame
object with only the selected columns, and also the extra
INDEX
column we have created.
Note: sample(precent)
is s random true,false
genrtator (function) that will return true
on a percent of the calls. This is very useful for sampling the data from a large frame. cFrame
In this there columns in the original data that remains the same for a country and does not change by date (for this dataset). So we can use the Frame.groupBy
method to create a frame that
But first we need to find how many unique values in a column, using a .groupBy operation. The gb.uniqueCount accumulator does not exist. So this section will demonstrate how to use the gb.make method to create our own accumulator. First, the accumulator methods such as gb.count or gb.max are the same as a Array.reduce operation with a small twist.
Consider performing max of an array using reduce:
 As you can see this is not what we wanted, we have the sum and the count, but not the average. We need to perform one final operation to get our desired value.
 Typically after a reduce operation we may need to do some finalization operation, and a reducer function will need to do this. We do thsi by providing an action parameter to the reducer:
Action | Description |
---|---|
1 | perform accumulate action |
2 | perform finalize action |
The format for the reducer (for .groupBy accumulators) is as follows:
function average(action,accum, count, value) {
if(action === 2) return [accum/count, count]; // finalize the result
if(!isNumber(value)) return accum; // ignore non-numeric
return [accum+v, count+1] // add value to accum, increment count
}
The reducer on its own is not enough, we need to wrap it in functionality yo make is a groupBy accumulator there is a handy utility to do just that:
gb.make(reducer_func, initial_value_func, optional_initial_count )
to make a gb. like func we do the following:
let gbAvg = gb.make(average, () => 0);
// now we can use it in a frame
Having learnt about gb accumulators
 
We are going to use uniqueCount to count the unique values for all the
 
The transpose of a frame is analogous to the transpose of a matrix or a spreadsheet block. We have the columns into row are roes into columns. The only twist to is is to choose the columns to use as the headings.
   
Finall if we put everything together we get the following
 
Here we will use a simpler version of select that keeps all the columns as is and filters out some of the rows. This is the same as the following sql
select * from covidFrame where location = 'Brazil'
 
Create a frame with general country information country information into another frame
When we look at the original covidFrame there is a lot of duplication of data. For example, each row of d
 
Frame supports number formattion function.
As you can see the numbers are nicely formatted, Note: this is for display only, the data in the frame is not changed.
Please feel free the edit the code and
This section will reformat the data by continent to create summary by continent
Transpose will rotate the table, the column name comes from the row in the country column. Pay special attention the 'country' column is call the transpose pivot and it is important all the values in this column are unique. If the values are not unique the result of the transpose will look very weird and may violate the requirement that all the column names rae unique.
 
This section will show you how to display large numbers in a human friendly format
To do this we use the frame.update method, this method take an update object as a parameter
We need to find columns that are mostly numeric '90%' or empty, frame.numericColumns will get us the list of those column names
To apply a update function on a column we use the Frame.update method.
This method takes a mapping objec as a paremeter
The mapping object has as the key the coulmn name and value is the function to apply to the coulmn
The function takes two values mappingFunc1(columnValue, rowObject) * The result of the function is set to the column
The mapping obj should look like
{
col1: forceZero,
col2: forceZero,
...
}
What happened
This create and update mapping object. Let us apply this to update the frame (as always this will create a new frame)
 Now I will show you a rather useful function in the DataFrame library
**Before** we do this let us introduce some useful utility function in the DataFrame package __arrProd__. So let look at what we need to do:
 
So using the arrProd becomes
  
As you will notice in the covidFrame there is a lot of redundent data, for example most of the country data is repeated
on ever row for thar country and there is some missing data for example iso_code is not present for every locaton. Since we would like to use the iso_code
as the unique key for every country, let us find the locations with missing iso_code