🟩 CalculateStatistic

This is used to check the quality of your data.

Calculate statistics for the given table. The Data Profiling / Statistics is used to check the quality of your data. Providing over 30 different statistics ranging from % filled/empty cells to most common values & counts.

public DataTable CalculateStatistic(DataTable data, CancellationToken cancellationToken)

Parameters


Parameter

Type

Definition

data

DataTable

DataTable for which statistic should be calculated

cancellationToken

CancellationToken

cancellationToken

Returns


Returns

Table with statistics

Description for each statistic table field name:

Column Name - Name of column from the original data table
Type - The declared data type for the column
Filled - The count of records that contain any data
Empty - The % of records that are blank
Distinct - The count of all unique values
Trailing Spaces - Number of records that have a trailing spaces (e.g. "John Smith ")
Commas - Number of records that contain a comma (e.g. "10, Main Street")
Dots - Number of records that contain dots (e.g. "New.York")
Hyphens - Number of records that contain hyphens (e.g. "0986-5652")
Apostrophes - Number of records that contain apostrophes (e.g. "John's Business")
Leading Spaces - Number of records that have a leading spaces (e.g. " John Smith")
Letters - Number of records that only contain letters
Numbers - Number of records that only contain numbers
Non Printables - Number of records that contain non-printable characters. Non-printable characters are parts of a character set that do not represent a written symbol or part of the text within a document or code, but rather are there in the context of signal and control in character encoding. Non-printable characters are used to indicate certain formatting actions, such as: White spaces (considered an invisible graphic), Carriage Returns, Tabs, Line Breaks, Page Breaks and Null characters
With Spaces - Number of records that have any space
Multiple Spaces - Number of records that have more than one spaces (e.g. " John Smith ")
New Line Char - Number of records that contain a new line character
Tab Char - Number of records that contain a tab character
Punctuation - Number of records that contain punctuation marks. Punctuation marks are: period, comma, question mark, hyphen, dash, parentheses, apostrophe, ellipsis, quotation mark, colon, semicolon, exclamation point
Upper Only - Number of records that contain Upper case only characters (e.g. "JOHN SMITH")
Lower Only - Number of records that contain Lower case only characters (e.g. "john smith")
Proper Case -Number of records that contain both Upper and Lower case in a standardized format (e.g. "John Smith")
Mixed Case - Number of records that contain both Upper and Lower case which are mixed together (e.g. "JoHN SmiTH)
Most Common - The most common value within the column
Most Common Count - The most common count within the column
Min Number - The lowest number within that column
Max Number - The highest number within that column
Max Words - The maximum number of words
Average Words - The average count of words
Max Length - The maximum length of words
Average Length - The average length of words


Did this page help you?