Click to show TOC
You can explore the package and explore the some of the core caplbilities by clicking
Package Deep Dive
Here is another interesting example
The area below is an example of a code editor where you can try out the code examples. The editor, nor any part of this page send any data to the server. Everything you see is beeing executed on the browser that does not require a back end server.
Changes you make to the code in any of the exitors can be retained using the browser local storage, that is locally on the machine you are using.
 
This is a database like data manipulation library. It lets you do simple database SQL like operations on in-memory tabular data. The main component of the library is a Frame. This is a data structure that contains tabular data made up of of rows and columns. Each column has a name, and just like the columns of a database table the names must be unique. The tool has been tested with 1 million rows with good performance. The total amount of data is limited by the memory of your browser or node instance. All elements (cells/column) are stored as strings or numbers.
A Frame
That conversion is done by the frameFromBuffer
function. Please note thst the frame is aware that some numbers may be stored as strings, and most of the utilities will treat numbers in string format as if they were numbers. Strictly speaking you can store any any javascript primitive is a cell, but that has not been fully tested and so buyer be aware.
The operations include the ability to do the following:
gb.count
count the number of non-empty values is a groupgb.max
- maximun value of a column in a groupgb.min
- the minimum value of a column in a group gb.min
- the minimum value of a column in a group gb.mean
- the average value of a column in a group (ignoring empty values)gb.stdDev
- the standard deviation value of a column in a group (ignoring empty values)
The inspiration for data frame came from two sources, SQL and the powerful python pandas utility. Frame does not attempt to provide all the functionality of pandas or that of SQL. The examples provided here will give a taste of some of the things it can do.
we have to bring in the str-data-frame npm module. We have several choices, using javascript es6+ module symtax, using webpack or similar bundler, or just as a script. For rhe purposed of this tutorial I am using the good old script method. I creates in the global namesapce DataFrame variable.
Lets do a quick look a what is defined in the package. In this html page we use the following unpkg.com to load the str-data-frame package. The exported name from the Package is DataFrame.
<script src="https://unpkg.com/str-data-frame@0.2.18/dist/bundle.js"></script>
Get the names (keys) in the module using Object.keys(DataFrame)
map the list of names to name/value pair [name, value]
map the [name, value]
to [name
, type of value, display string for the value
]
result is put into data
Note: We use he two following helper function
Display function D(value)
function D(d) {
if(d === undefined ) return "<undef>";
if(typeof d === 'function') return "<func body>";
if(d instanceof Date) return d.toString;
if(typeof d === 'object') return JSON.stringify(d)
return d.toString();
}
function getType(d) {
if(d === undefined) return 'undefined'
if(Array.isArray(d)) return 'Array';
if(isClass(d)) return "class";
return typeof d;
}
So you are in a hurry and would like to see if Frames are yor you. If you have table like data, Data Frames are for you. Think of them as fast in-memory database tables.
We will use world covid-19 daily statistics, this is a csv file that we will split into an array of lines. We can use the built in fetch function to pull data from the public repository on Github. Fetch returns a promise. To learn more about Promise on the link. Once the promise resolves, the data will be available for use. As always last expression in the script will be displayed. If the value is a promise, TryITjs will wait for the promise to resolve and then display the result.
 
Since csvData is just a big string we can use the string.split
opertation to convert it to an array of lines, where each line is a string.
Since the CSV data is just a big string
, to converts it to a frame we have to do the following
csvLine(aLine)
will split a 'CSV' line string into an array of column values. This is also known as a line splitter, the following line splitter existcsvLine
- for CSV filestsvLine
- for tab seperated valuespsvLine
- '|' pipe seperated value. THis si a common format in healthcare The following code is a simplified version of what .frameFromString
does
//frameFromString is at its core the following code
var lines = csvData.split('\n').map(csvLine);
var covidFrame = new Frame(lines.slice(1), lines[0]);
Create a frame from string in the CSV file using the script below. Once we have the Frame we can show the first 20 rows using .slice
similar to slice on a Javascript Array, and as always the last expression is displayed in the results area.
We converted a string to a frames.
obj._toHtml()
and uses that to render the display. In the next section we will see how we can further improve Frame rendering by overriding the _toHtml
method. Although in the tutorial for Frame we will not be using this capability, but it is a useful thing to know for those creating their own TryITjs tutorials.
A Frame has a method ._toHml() to render a frame as HTML. This is a nice Table rendering module DataTable. Details for useing this capability can be found here: DataTable - a plug-in for the jQuery Javascript library.
More details about rendering a DataFrame (or any other object) to HTML:
We can customize the rendering as follows:
 
We will plot some data from the covid Frame for USA
Tryitjs does not have plotting out of the box, but capability was added with the following code.
Click to show the code
<script>
function Identity(x) { return x}
function genID(prefix) { return (prefix || 'T')+Math.round(Math.random()*100000); }
function capitalStr( phrase ) { return phrase.replace(/\b\w/g, c => c.toUpperCase());}
function addDiv(id, ext) {
let targetDivName = `#${id}_${ext}`;
let targetDiv = document.querySelector(targetDivName);
if(!targetDiv) {
targetDiv = document.createElement('div');
targetDiv.id = targetDivName.substr(1);
let sibling = document.querySelector('#'+id);
sibling.after(targetDiv);
}
else targetDiv.innerHTML = '';
return targetDiv;
}
function plotDiv() {
let dn = $$.executeDiv;
return addDiv(dn+'-display','plot'); // add a div for plotting after the display div
}
function HTML(data) {
if(data === undefined || typeof data._toHtml === 'function') return data;
return { _toHtml: () => data }
}
function multiSelect(opts,list) {
let {onchange, id} = opts || {};
id = id || genID();
let html = `
<div id="${id}" class="ui multiple selection dropdown">
<input name="thick" type="hidden">
<i class="dropdown icon"></i>
<div class="default text">100</div>
<div class="menu">
${list.map(([key,val]) => `<div class="item" data-value="${key}">${val}</div>`).join('\n')}
</div>
</div>
`;
onchange = onchange || Identity
$$.lastly(() => {
$('#'+id).dropdown();
if(opts.onchange){
$('#'+id).change( function(e) {
let list = $(e.currentTarget).dropdown("get values");
onchange(list,e);
} );
}
});
return HTML(html);
}
</script>
<script src='https://cdn.plot.ly/plotly-latest.min.js'></script>
<script>
/**
* Creates a new Plotly plot in the specified DOM element or generates a new DOM element for the plot.
*
* @param {(string|Array)} dn - The ID of the DOM element where the plot will be rendered.
* If `dn` is a string, the plot is rendered in the existing element with this ID.
* If `dn` is not a string (e.g., an array), a new DOM element is generated for the plot,
* and the function expects the parameters to be passed in the following order:
* `dn` as the data array, `data` as the layout object, and `layout` as the config object.
* @param {Array} data - An array of trace objects that defines the data to be plotted.
* @param {Object} layout - The layout object that defines the layout of the plot (optional).
* @param {Object} config - The configuration options for the plot (optional).
* @returns {(string|Object)} - If `dn` is a string, returns the `dn` string.
* If `dn` is not a string, returns an HTML object containing a div element where the plot is rendered.
*
* @example
* // Example usage with an existing DOM element ID
* newPlot('myDiv', [{x: [1, 2, 3], y: [4, 5, 6], type: 'scatter'}]);
*
* // Example usage without a specified DOM element ID
* newPlot([{x: [1, 2, 3], y: [4, 5, 6], type: 'scatter'}], {}, {});
*/
function newPlot(dn, data, layout,config) {
if(typeof dn === 'string') {
Plotly.newPlot(dn, data, layout, config);
return dn;
} else {
let div = genID($$.executeDiv);
let [_data, _layout, _config] = [dn, data, layout]; // no dn parameter
$$.lastly(true,() => Plotly.newPlot(div, _data, _layout, _config));
return HTML(`<div><div id="${div}" /></div>`)
}
}
function simleHistogram(x, layout={}, dn=undefined ) {
var trace = {
x: x,
type: 'histogram',
};
var data = [trace];
newPlot(dn || 'myDiv', data);
}
</script>
In this section we will show how to:
100*total_deaths/total_cases
percent of people who die after contracting COVID-19Note we create a new frame, the riginal frame 'covidFrame' remains unchange. Frames should always be treated as immutable data. Modifying a frame may prevent it working correctly. It is highly recommended that you never manipulate the internal (instance) vatiable of a frame.
 
This section will demonstrate the groupBy capability of DataFrame to create summary of data.
Next we want to rollup the data into continents, for this we will use the group
the data by continent
This is not quite what we need, since this is just giving us the continent names, we also need
to keep the date
column
Next we want to aggrigate the total_cases
and total_deaths
, so we need to sum (add up) the
thos value for every country in a continent and also on that date. We have a operation for that
in the groupBy,
We saw earlier the original 'covidFrame' contains a large quantity of daily covid information for every country. A row in that frame represented a country and date and information related for that day, as well as some fixed (not date dependent) information about the counter. From the 'covidFrame' we create a new frame ('countries') selecting only the columns we need for the task at hand.
The we create a frame for each continent by aggrigation daily data for each country in a continent, (group by 'continent' and 'date'). Further, we created two computed columns for each continent ('total_death_per_million', 'total_cases_per_million'). Note: it is easier to recompute that for a continent rather than extract from the individual nation's column for the same.
In this section we will show how to create summary of the world codiv date to create summary information for countries and continents.
We saw earlier the original 'covidFrame' contains a large quantity of daily covid information for every country. A row in that frame represented a country and date and information related for that day, as well as some fixed (not date dependent) information about the counter. From the 'covidFrame' we create a new frame ('countries') selecting only the columns we need for the task at hand.
Next we
We are going to turn the operation above into a utility function that takes a country frame and and create a continent frame, where the entire continemt structurally looks like a single country as we demonstrated above. But the display can be misleading, since there som countries where the cases and deaths are so high that they skew the result for the other countries in the continent. One way of dealing with the problem is to peel away the country from its continents and to treat the country as a continent in its own right.
The easiest way to do that is to set the country
   
You will notice countries have a 'iso_code' and a 'location' namely the country (loca)
This is another roll up process where the dates are not important, only the largest calues of
total_cases
and total_deaths
We will use the continentSummary frame to build pie charts from the data. Again we will be using the powerful but simple to use Plotly libray.
Lets create a helper function to create a pie charts. This function will use the .rawColumn(colName) method to get a column from a frame as simple javascript array.
Just a note: _ rawColumn will get the values in the column in their format in the frame, if a column has numeric values but is string form, the will remain a string. Plotly will kindly convert strings to numbers, where numbers are required.
Now plot some pie charts. The countries with the most cases are removed from their respective continents and are treated as if they were their own continents. So for example North America excludes the United States.
 
This time we will rollup data into countries and continents
iso_code
and location
continent
columnmortality
this value is computed by dividing total_deaths
by total_cases
In the following section we will show you how to do that using the select
method of a frame.
The countries and continents are similar but not identical, so we cannot combine them into a single table.
iso_code
and location
columns.
To select a few columns from a frame (including new ones)
total_deaths/total_cases
- a computed value
We can create a new frame by concatinating two frames, as long at they hace the same number and order of the columns. In the previous script we created
countries
framecontinents
continents
frame has 'iso_code' and 'location' columnsNow that we have the countries frame and the continents frame in the same shape we can combile the into one table using the concat method
 
Plot the data using the beautiful plotly library. So first thing we need is a plottting function based on the plotly.js library. This library has to do the following:
 
Data Frames have a powerful aggrigation capability in frame.groupBy
, this capability is built on an similar capability in SQL, namely groupby
. The key idea here is to collect
into groups all rows that have the sane value in a column, or a set of columns. On each group we can the apply come aggrication operation on some of thr remaining columns.
The provided code is a combination of JavaScript and HTML that deals with rendering a user interface (UI) for selecting a country and a continent, along with controlling a plot's Y-axis scale (linear or logarithmic). The code includes dropdowns for selecting these values, handling their events, and triggering a plot update based on the selected options.
Here’s a breakdown of the key components and functionality:
countryDropdown
)
UIcategoryList
. The data is passed to it after grouping by iso_code
, location
, and continent
. The dropdown is populated with countries for selection.selCountry
function is called with the selected country’s iso_code
(ID
). let countryDropdown = UIcategoryList(
countryContinent.groupBy(['iso_code=ID', 'location=title', 'continent=category']).asObjList(),
{
caption: 'Select Country',
onSelect: result => selCountry({ C: result.ID }) // Triggered on country selection
}
);
continentDropdown
)
<select>
dropdown for continents. The options
function generates a list of <option>
elements based on the available continents in countryContinent
. The continent dropdown allows the user to select a continent.onchange
event calls the selCountry
function, passing the selected continent’s value (R
). function options(countryContinent, selected) {
return countryContinent.groupBy(['iso_code', 'location'])
.sort(['location'])
.map(ro => `<option value="${ro.iso_code}" ${ro.location === selected ? 'selected' : ''}>${ro.location}</option>`)
.join('\n');
}
let continentDropdown = `
<p class="ui header huge">Select a Continent</p>
<select class="ui selection dropdown" name="continents" id="continents" onchange="selCountry({ R: this.value })">
${options(countryContinent.filter(ro => ro.iso_code === ro.location), 'North America')}
</select>`;
setLinearOrLog
)
checked
by default).scaleOnChange
function is called, which updates the isLinear
property of the OPTS
object (representing options for the plot). var setLinearOrLog = `<p class="ui header huge">Y Scale</p>
<input class="ui checkbox" type="checkbox" checked id="IsLinear" onchange="scaleOnChange(this)" />
<label for="IsLinear"> Linear Axis</label> <br />
</div>
</div>
</div>`;
function scaleOnChange(element) {
selCountry({ isLinear: element.checked }); // Update plot options
}
4. Rendering the UI
- The
$$.D()
function is used to render the generated HTML elements for the country dropdown, continent dropdown, and Y-scale toggle. These elements are placed in the document.
$$.D(HTML(countryDropdown), HTML(continentDropdown), HTML(setLinearOrLog));
5. Plotting the Data (doPlotLines
)
- The main function for plotting data is
doPlotLines
. It is initially triggered with default values: country USA
, continent North America
, and linear scale.
selCountry
function: This function updates the global OPTS
object whenever a selection is made (either a country, continent, or axis type). It then re-plots the data using the updated values.
- Plot Initialization: At the end of the script, the plot is created and initially rendered with default values.
var OPTS = ({ C: 'USA', R: 'North America', isLinear: true });
var isLinear = true;
function selCountry(opts) {
if (opts) {
OPTS = Object.assign(OPTS, opts); // Merge new options with existing options
}
doPlotLines(countryContinent, [OPTS.C, OPTS.R], dn, OPTS.isLinear, { scrollZoom: true });
}
// Initialize the plot with default country/continent
$$.HTML(``);
$$.lastly(true, () => doPlotLines(countryContinent, ['USA', 'North America'], dn, isLinear, { scrollZoom: true }));
6. Global Variables
OPTS
: This object holds the current state of selected options: C
for country, R
for region/continent, and isLinear
for Y-scale.
dn
: A dynamic identifier used to locate the specific plot container in the DOM.
Summary of the Functionality:
- Country Dropdown: Lets the user select a country from a grouped list.
- Continent Dropdown: Lets the user select a continent, and defaults to "North America".
- Y-Scale Toggle: Allows switching between linear and logarithmic Y-axis scaling for the plot.
- Dynamic Plotting: When the user selects a country, continent, or changes the scale, the plot is updated accordingly.
- Initial Plot: The plot starts with default values (
USA
, North America
, and a linear scale).
Some UI Test
This example uses React/jsx for inline rendering using components provided by react-semantic-ui
- https://react.semantic-ui.com/
This section will demonstrate how to build some custom UI. The following use cases will be used.
- Create a dropdown, for a categoryized list of items
- Using UIcategoryList - a function to generate the necessary HTML
- Show how to respond to a selection
- Use react JSX to create UI components
- Create a simple Recat component (click counter) that manages its
 
Page - 9
More Covid Frame
Show the size of the frame and the column names
- Length of the frame (covidFrame.length)
- All column Names (covidFrame.columns)
- List the columns that are mostly numeric (covidFrame.numericColumns)
 
Show the data summary as a Frame
In order to create the frame we need to do the following
- Function to count the number of non-empty values in a column
countNonEmptyCells(columnName)
- Function to get the type of the column Numeric or Alpha
columnType(colName)
 
Page - 10
Get rid of unnecessary columns
In sql we can do this using select columns from table, further the select operation can rename the columns, add new columns, and transform data is columns
We use a similar operation for a data frames using the project
frame.project([ list of columns])
- We can change a column name with the following code, change location to country
[..., "location=country",...]
We also can reorder the columns by chosing the order of the columns, the example below shows
how to create a new frem from a subset of the original columns and renaming a column
Note We can also add new columns by adding a new column name
Removing data redundency
In the covid table there is a great deal of data redendency and unnecessary data for example for each row has a iso_code
for each country, country name, and continent.
In database terminology we say that the data is not normalized. In this section we will show how we can normalize the data.
This section we will look at how to create the following:
- Create frames with only some of the columns
- Create frames to remove redundent data.
- How to combine data from two frames
But first lets fill some missing data
- iso_code is the unique country code
- Some locations are not countries so they do not have a
iso_code
or a continent
- for thos missing value let us set the
iso_code
and continent to the value of location
 
Lets fix those values
Looking at the results notice the following:
- International does not have an iso_code or a continent
- World does not have a continent
We can fill those values with the value in location. We can do that will the code below:
 
Select a subset of columns
Select the most important columns using Frame.select
method, in this prosess we can:
- Rename columns -
'location=country'
_rename location to country
- Create new columns -
['INDEX', (value, row) => row.__index$]
-- create and index column
So the following will create a ne Frame
object with only the selected columns, and also the extra
INDEX
column we have created.
Note: sample(precent)
is s random true,false
genrtator (function) that will return true
on a percent of the calls. This is very useful for sampling the data from a large frame. cFrame
 
Country Unique Values
In this there columns in the original data that remains the same for a country and does not change by date (for this dataset). So we can use the Frame.groupBy
method to create a frame that
 
Analyze Columns
But first we need to find how many unique values in a column, using a .groupBy operation. The gb.uniqueCount accumulator does not exist. So this section will demonstrate how to use the gb.make
method to create our own accumulator. First, the accumulator methods such as gb.count or gb.max are
the same as a Array.reduce operation with a small twist.
Consider performing max of an array using reduce:
 
As you can see this is not what we wanted, we have the sum and the count, but
not the average. We need to perform one final operation to get our desired value.
 
Typically after a reduce operation we may need to do some finalization operation, and
a reducer function will need to do this. We do thsi by providing an action parameter
to the reducer:
Action
Description
1
perform accumulate action
2
perform finalize action
The format for the reducer (for .groupBy accumulators) is as follows:
function average(action,accum, count, value) {
if(action === 2) return [accum/count, count]; // finalize the result
if(!isNumber(value)) return accum; // ignore non-numeric
return [accum+v, count+1] // add value to accum, increment count
}
The reducer on its own is not enough, we need to wrap it in functionality yo make is a groupBy accumulator
there is a handy utility to do just that:
gb.make(reducer_func, initial_value_func, optional_initial_count )
to make a gb. like func we do the following:
let gbAvg = gb.make(average, () => 0);
// now we can use it in a frame
Having learnt about gb accumulators
 
Count unique values for all columns
We are going to use uniqueCount to count the unique values for all the
 
Using transpose to flip the fame
The transpose of a frame is analogous to the transpose of a matrix or a spreadsheet block. We have the columns
into row are roes into columns. The only twist to is is to choose the columns to use as the headings.
 
 
Common coutry columns
Finall if we put everything together we get the following
 
Get data for Brazil
Here we will use a simpler version of select that keeps all the columns as is and filters out some of the rows. This is the same as the following sql
select * from covidFrame where location = 'Brazil'
 
Page - 11
Split COVID frame
Country frame
Create a frame with general country information country information into another frame
When we look at the original covidFrame there is a lot of duplication of data. For example, each row of d
 
Summary of cases by country
- Use the Frame.groupBy to group all data by country and continent
- For each group (country) do the following
* Get maximum of total_deaths and give the column the name deaths
* Get count of number of rows (using the 'INDEX' colums) and call the column count, Note: count the number of non empty values
* Get maximum of total_cases and call the column cases
* Get maximum of population column, since this value does not change we can just use it as a part of the grouping operation
 
Lets try to improve the display of numbers
Frame supports number formattion function.
- Set a global formatting function for numeric data
Frame.HTMLFormat.number = toStr
- toStr(s: string)
takes a string (or number) and returns a fromatted numeric string with commas add to make is easier to read
- This does not alter the data in the frame
 
As you can see the numbers are nicely formatted, Note: this is for display only, the data in the frame is not changed.
Please feel free the edit the code and
Page - 12
Summarize the information by continent
This section will reformat the data by continent to create summary by continent
- Group By continent
 
Transpose
Transpose will rotate the table, the column name comes from the row in the country column. Pay special attention
the 'country' column is call the transpose pivot and it is important all the values in this column are unique. If the values are not unique the result of the transpose will look very weird and may violate the requirement that all the column names rae unique.
 
Sample 1% (0.01) of the data and show the first 20
- Previously we created cFrame from covidFrame just keeping the most important data
 
- The code / display above demonstrates how to display a sample of the data
Create some utility function to round a numeric string
This section will show you how to display large numbers in a human friendly format
- Since all data is stored as strings or number, empty values are represented ny the empty string
- The Math.round function will convert it to a numeric round the value convert back to a string
 
Make sure no numeric column has empty values
To do this we use the frame.update method, this method take an update object as a parameter
We need to find columns that are mostly numeric '90%' or empty, frame.numericColumns will get
us the list of those column names
To apply a update function on a column we use the Frame.update method.
This method takes a mapping objec as a paremeter
The mapping object has as the key the coulmn name and value is the function to apply to the coulmn
The function takes two values mappingFunc1(columnValue, rowObject)
* The result of the function is set to the column
The mapping obj should look like
{
col1: forceZero,
col2: forceZero,
...
}
- The columns are the numeric columns of the frame, and we can get that cFrame.numericColumns
 
What happened
This create and update mapping object. Let us apply this to update the frame (as always this will create a new frame)
 
Now I will show you a rather useful function in the DataFrame library
**Before** we do this let us introduce some useful utility function in the DataFrame package __arrProd__. So let look at what we need to do:
 
So using the arrProd becomes
 
Create a summary of covid data
 
Normalizing data
As you will notice in the covidFrame there is a lot of redundent data, for example most of the country data is repeated
on ever row for thar country and there is some missing data for example iso_code is not present for every locaton. Since we would like to use the iso_code
as the unique key for every country, let us find the locations with missing iso_code
 
Page - 13