🟩 CalculateStatistic

This is used to check the quality of your data.

Calculate statistics for the given table. The Data Profiling / Statistics is used to check the quality of your data. Providing over 30 different statistics ranging from % filled/empty cells to most common values & counts.

public DataTable CalculateStatistic(DataTable data, CancellationToken cancellationToken)

Parameters


ParameterTypeDefinition
dataDataTableDataTable for which statistic should be calculated
cancellationTokenCancellationTokencancellationToken

Returns


Returns
Table with statistics

Description for each statistic table field name:

Column Name - Name of column from the original data table
Type - The declared data type for the column
Filled - The count of records that contain any data
Empty - The % of records that are blank
Distinct - The count of all unique values
Trailing Spaces - Number of records that have a trailing spaces (e.g. "John Smith ")
Commas - Number of records that contain a comma (e.g. "10, Main Street")
Dots - Number of records that contain dots (e.g. "New.York")
Hyphens - Number of records that contain hyphens (e.g. "0986-5652")
Apostrophes - Number of records that contain apostrophes (e.g. "John's Business")
Leading Spaces - Number of records that have a leading spaces (e.g. " John Smith")
Letters - Number of records that only contain letters
Numbers - Number of records that only contain numbers
Non Printables - Number of records that contain non-printable characters. Non-printable characters are parts of a character set that do not represent a written symbol or part of the text within a document or code, but rather are there in the context of signal and control in character encoding. Non-printable characters are used to indicate certain formatting actions, such as: White spaces (considered an invisible graphic), Carriage Returns, Tabs, Line Breaks, Page Breaks and Null characters
With Spaces - Number of records that have any space
Multiple Spaces - Number of records that have more than one spaces (e.g. " John Smith ")
New Line Char - Number of records that contain a new line character
Tab Char - Number of records that contain a tab character
Punctuation - Number of records that contain punctuation marks. Punctuation marks are: period, comma, question mark, hyphen, dash, parentheses, apostrophe, ellipsis, quotation mark, colon, semicolon, exclamation point
Upper Only - Number of records that contain Upper case only characters (e.g. "JOHN SMITH")
Lower Only - Number of records that contain Lower case only characters (e.g. "john smith")
Proper Case -Number of records that contain both Upper and Lower case in a standardized format (e.g. "John Smith")
Mixed Case - Number of records that contain both Upper and Lower case which are mixed together (e.g. "JoHN SmiTH)
Most Common - The most common value within the column
Most Common Count - The most common count within the column
Min Number - The lowest number within that column
Max Number - The highest number within that column
Max Words - The maximum number of words
Average Words - The average count of words
Max Length - The maximum length of words
Average Length - The average length of words