- Alteryx Download Page
- Download Alteryx Designer
- Alteryx Download Manager
- Alteryx Download Mac Installer
A Code Page (also referred to as Character Set or Encoding) is a table of values where each character has been assigned a numerical representation. A code page enables a computer to identify characters and display text correctly.
Idea: An Alteryx version for Mac OS X sounded like a nice idea. Although there are options for using bootcamp with windows 7-8 or some virtualisation software as mentioned in a community post here. Rationale 1 (Competitors do it): First of all there is no need to neglect a customer segment using. Cloud-based analytics from Alteryx, Inc. Most Popular + Latest Workflows. Alteryx Download and Install for Windows & Mac Step-by-Step Alteryx is the one super tool that combines the functionalities required for Data Preparation, Geo-Spatial Analytics, and Data Science. ALTERYX FOR BEGINNERS. To download a product: Sign in to the Alteryx Downloads and Licenses portal at licenses.alteryx.com. In the Product Downloads area, click the product to download. In the Download Packages window, click the version of the product you want to download. In Downloads, click the plus sign to display a description of the product.
Alteryx supports many code pages that can be selected when inputting and outputting data files via the Input Data Tool and Output Data Tool, or when converting data types using the Blob Convert Tool. Additionally, the ConvertFromCodepage and ConvertToCodepage functions, available within tools that have an expression editor, can use code page identifiers to convert strings between code pages and UnicodeĀ® encoding, the universal character-encoding standard for all written characters as created by the Unicode Consortium.
Alteryx assumes that a wide string is a UnicodeĀ® string and a narrow string is a Latin 1 string. if you convert a string to a code page, it will not display correctly. Therefore, code pages should only be used to override text encoding issues within a file. Code pages can be different on different computers, or can be changed for a single computer, leading to data corruption. For the most consistent results, use UnicodeĀ® encoding, such as UTF-8 or UTF-16, instead of a specific code page, which allows different languages to be encoded in the same data stream.
UTF-8 is the most portable and compact way to store any character and is used most often. Both UTF-8 and UTF-16 are variable-width encoding, but UTF-8 is compatible with ASCII and the files tend to be smaller than with UTF-16.
For more information on code pages, see the MSDN Library.
Code page identifiersThese code page identifiers can be used with the ConvertFromCodepage and ConvertToCodepage functions. See Functions
37 (IBM EBCDIC - U.S./Canada)
437 (OEM - United States)
500 (IBM EBCDIC - International)
708 (Arabic - ASMO)
720 (Arabic - Transparent ASMO)
737 (OEM - Greek 437G)
775 (OEM - Baltic)
850 (OEM - Multilingual Latin I)
852 (OEM - Latin II)
855 (OEM - Cyrillic)
857 (OEM - Turkish)
858 (OEM - Multilingual Latin I + Euro)
860 (OEM - Portuguese)
861 (OEM - Icelandic)
862 (OEM - Hebrew)
863 (OEM - Canadian French)
Alteryx Download Page
864 (OEM - Arabic)
865 (OEM - Nordic)
866 (OEM - Russian)
869 (OEM - Modern Greek)
870 (IBM EBCDIC - Multilingual/ROECE (Latin-2))
874 (ANSI/OEM - Thai)
875 (IBM EBCDIC - Modern Greek)
932 (ANSI/OEM - Japanese Shift-JIS)
936 (ANSI/OEM - Simplified Chinese GBK)
949 (ANSI/OEM - Korean)
950 (ANSI/OEM - Traditional Chinese Big5)
1026 (IBM EBCDIC - Turkish (Latin-5))
1047 (IBM EBCDIC - Latin-1/Open System)
1140 (IBM EBCDIC - U.S./Canada (37 + Euro))
1141 (IBM EBCDIC - Germany (20273 + Euro))
1142 (IBM EBCDIC - Denmark/Norway (20277 + Euro))
1143 (IBM EBCDIC - Finland/Sweden (20278 + Euro))
1144 (IBM EBCDIC - Italy (20280 + Euro))
1145 (IBM EBCDIC - Latin America/Spain (20284 + Euro))
1146 (IBM EBCDIC - United Kingdom (20285 + Euro))
1148 (IBM EBCDIC - International (500 + Euro))
1149 (IBM EBCDIC - Icelandic (20871 + Euro))
1250 (ANSI - Central Europe)
1251 (ANSI - Cyrillic)
1252 (ANSI - Latin I)
1253 (ANSI - Greek)
1254 (ANSI - Turkish)
1255 (ANSI - Hebrew)
1256 (ANSI - Arabic)
1257 (ANSI - Baltic)
1258 (ANSI/OEM - Viet Nam)
1361 (Korean - Johab)
10000 (MAC - Roman)
10001 (MAC - Japanese)
10002 (MAC - Traditional Chinese Big5)
10003 (MAC - Korean)
10004 (MAC - Arabic)
10005 (MAC - Hebrew)
10006 (MAC - Greek I)
10007 (MAC - Cyrillic)
10008 (MAC - Simplified Chinese GB 2312)
10010 (MAC - Romania)
10017 (MAC - Ukraine)
10021 (MAC - Thai)
10029 (MAC - Latin II)
10079 (MAC - Icelandic)
10081 (MAC - Turkish)
10082 (MAC - Croatia)
20000 (CNS - Taiwan)
20001 (TCA - Taiwan)
20002 (Eten - Taiwan)
20003 (IBM5550 - Taiwan)
20004 (TeleText - Taiwan)
20005 (Wang - Taiwan)
20105 (IA5 IRV International Alphabet No.5)
20106 (IA5 German)
20107 (IA5 Swedish)
20108 (IA5 Norwegian)
20127 (US-ASCII)
20261 (T.61)
20269 (ISO 6937 Non-Spacing Accent)
20273 (IBM EBCDIC - Germany)
20277 (IBM EBCDIC - Denmark/Norway)
20278 (IBM EBCDIC - Finland/Sweden)
20280 (IBM EBCDIC - Italy)
20284 (IBM EBCDIC - Latin America/Spain)
20285 (IBM EBCDIC - United Kingdom)
20290 (IBM EBCDIC - Japanese Katakana Extended)
20297 (IBM EBCDIC - France)
20420 (IBM EBCDIC - Arabic)
20423 (IBM EBCDIC - Greek)
20424 (IBM EBCDIC - Hebrew)
20833 (IBM EBCDIC - Korean Extended)
20838 (IBM EBCDIC - Thai)
20866 (Russian - KOI8)
20871 (IBM EBCDIC - Icelandic)
20880 (IBM EBCDIC - Cyrillic (Russian))
20905 (IBM EBCDIC - Turkish)
20924 (IBM EBCDIC - Latin-1/Open System (1047 + Euro))
20932 EUC-JP Japanese (JIS 0208-1990 and 0212-1990)
20936 (Simplified Chinese GB2312)
21025 (IBM EBCDIC - Cyrillic (Serbian, Bulgarian))
21027 (Ext Alpha Lowercase)
21866 (Ukrainian - KOI8-U)
28591 (ISO 8859-1 Latin I)
28592 (ISO 8859-2 Central Europe)
28593 (ISO 8859-3 Latin 3)
28594 (ISO 8859-4 Baltic)
28595 (ISO 8859-5 Cyrillic)
28596 (ISO 8859-6 Arabic)
28597 (ISO 8859-7 Greek)
28598 (ISO 8859-8 Hebrew: Visual Ordering)
28599 (ISO 8859-9 Latin 5)
28603 (ISO 8859-13 Latin 7)
28605 (ISO 8859-15 Latin 9)
38598 (ISO 8859-8 Hebrew: Logical Ordering)
50220 (ISO-2022 Japanese with no halfwidth Katakana)
50221 (ISO-2022 Japanese with halfwidth Katakana)
50222 (ISO-2022 Japanese JIS X 0201-1989)
50225 (ISO-2022 Korean)
50227 (ISO-2022 Simplified Chinese)
50229 (ISO-2022 Traditional Chinese)
51949 (EUC-Korean)
52936 (HZ-GB2312 Simplified Chinese)
54936 (GB18030 Simplified Chinese)
57002 (ISCII - Devanagari)
57003 (ISCII - Bengali)
57004 (ISCII - Tamil)
57005 (ISCII - Telugu)
57006 (ISCII - Assamese)
57007 (ISCII - Oriya)
57008 (ISCII - Kannada)
57009 (ISCII - Malayalam)
57010 (ISCII - Gujarati)
57011 (ISCII - Punjabi (Gurmukhi))
65000 (UTF-7)
65001 (UTF-8)
Use the Edit button of the Fuzzy Match Tool Configuration window to access the Edit Match Options.
Match Style is a predetermined method of finding an appropriate match between records of an input file. The individual match style choices are defined on the Fuzzy Match Tool page.
Match StyleAny predefined or custom, user-defined match styles will appear in this list. The subsequent specifications in the dialog box will be selected based on the match style chosen.
If you edit a predefined match style, it will change to 'Custom' in the drop down list. The settings specified in this custom match style will save with the workflow.
Add new custom match styles rather than deleting or editing default options.
You can delete a match style by selecting it from the drop down and clicking Delete. You can add a match style by typing in a new name and clicking OK.
Preprocess describes a procedure that runs before Generate Keys and the Fuzzy Match function. The Preprocess should result in better matches. The choices from this list include:
Preprocess- None: No Preprocess is run.
- Strip Punctuation: Any punctuation characters within the specified data field will be ignored while the tool is determining matches.
- Strip Punctuation & Salutations: Any punctuation characters as well as any titles such as 'MR' 'MS' and'MRS' within the specified data field are ignored while the tool is determining a match.
- Strip Punctuation & AND, OF & THE: Any punctuation characters as well as any instances of the words 'AND' 'OF' and 'THE' within the specified data field are ignored while the tool is determining matches.
- Strip Punctuation & Remove Units from US Addresses: Any punctuation characters as well as any unit numbers within the specified data field are ignored while the tool is determining matches.
Manual edits to preprocessing
The preprocess can be user-defined by editing the FuzzyMatchStyles.xml. This file is located in the Alteryx Runtime directory: Program FilesAlteryxbinRuntimeDataFuzzyMatch. This file should only be edited by a user who is familiar with XML and Regular Expressions.
Generate Keys is the method by which a potential match is identified.
Generate KeysDownload Alteryx Designer
Alteryx reads through the specified field and assigns Keys to the components of that field. Once all keys are generated, Alteryx compares the concatenated keys for every match field. If the keys generated are equal for two records, a potential match is identified and the pair will proceed to the next phase of the match process. Function choices are:
- None: Keys for this field are considered when deciding which records match.
- Digits Only: Only records with the same digits in the specified field will be matched.
- Digits Only - Reverse: Only records with the same digits (in the order from last to first) in the specified field will be matched.
- Double Metaphone: Double Metaphone is the preferred algorithm. An algorithm to code English words (and foreign words often heard in the English Language) phonetically by reducing them to 12 consonant sounds. This reduces matching problems from wrong spelling. The Double Metaphone is the preferred method for matching based on sound. It returns two keys if a word has two feasible pronunciations, such as a foreign word. For more information, see Double Metaphone.
- Double Metaphone w/ Digits: Uses the same Double Metaphone algorithm but includes digits as well. When there are digits in string, digits in the first token will be the key.
Soundex: An algorithm to code surnames phonetically by reducing them to the first letter and up to three digits, where each digit is one of six consonant sounds. This reduces matching problems from different spellings.
The algorithm was devised to code names recorded in US census records. The standard algorithm works best on European names. Variants have been devised for names from other cultures. For more information, see Soundex.
- Soundex w/ Digits: Uses the same Soundex algorithm but includes digits as well. When there are digits in string, digits in the first token will be the key.
- Whole Field (Case Insensitive): Only records where the entire field matches will be matched. Case is ignored.
- Alphanumeric Only (Case Insensitive): Looks only at alphanumeric characters to make a match. Case is ignored.
- Address Number + Soundex: Removes the address number from a string and applies the Soundex algorithm to the remainder of the field. The Soundex code is then appended to the address number to create a unique key.
1-(303)440-8896 would not match 303-440-8896.
Even though non-digit characters are ignored, these phone numbers still do not match because there is a leading 1 in the first record.
1-(303)440-8896 would match 303-440-8896.
Non-digit characters are ignored and numbers are matched from last (6) to first (3 or 1). For this record to match, specify that the Maximum Key Length = 10 to ignore the leading 1.
Alteryx automatically replaces the following leading letters and letter combinations prior to generating the match key:
Leading letter(s) | Replacement |
---|---|
AV | AF |
AH | A |
AW | A |
CAAN | TAAN |
DG | G |
D | G |
HA | A |
KN | K |
K | C |
MAC | MC |
M | N |
NST | NS |
PF | F |
PH | F |
Q | G |
SCH | SH |
Z | S |
Generate Keys for Each Word: Generates a separate key for each word.
Ignore if Empty: Ignores an empty value of the specified match field. If the fieldis empty, then no key will be generated and record will be thrown out.
Maximum Key Length: Specify the maximum length of the key to consider for the match.
Match FunctionThe Match function is a more granular process by which a match is identified, and a score is applied. This differs from keys, which must match exactly. Choices are:
- None - Key Match Only: Looks only at the Key Generation specifications.
- Levenshtein Distance: The smallest number of insertions, deletions, and substitutions required to change one string or tree into another. When the Levenshtein Distance is selected, the match score will be significantly lower due to differences. For more information, see Levenshtein Distance.
- Jaro Distance: A measure of similarity between two strings. The Jaro measure is the weighted sum of percentage of matched characters and necessary transpositions. The Jaro Distance is more forgiving than the Levenshtein Distance with respect to difference in strings. For more information, see Jaro-Winkler.
- Best of Jaro & Levenshtein: both match types are analyzed and the score is taken.
Function types
- Word-based (Match Function begins with 'Words:') functions look at any words within the specified field, regardless of the order the words are in.
- Non-word-based functions matches against the entire string as a whole.
- For word & digit functions, all tokens that have digits in them must be in both sides to consider a match. These would typically be used for addresses.
Word-based function options
- When Using Word Based Match, also use: You can specify an additional match method that will produce an additional score, taking the best one, and eliminate the need for running two instances of a Fuzzy Match tool:
- None: Uses the word based score only.
- Character: Uses the word-based match score in addition to a character match function. Two scores are generated and the best match score is used to identify the match.
- Character (No Spaces): Same as above, but spaces are ignored when generating the character-based match.
- Word Frequency Statistics (Word Match Only): You can specify a Word Frequency table based on predefined statistics. When specified, the words that appear in the database carry less importance when they are present in the incoming data, and the match score will be adjusted accordingly. Options include:
- [None]: No Word Frequency Statistics are used.
- Name: Contains frequent words in a name field. The frequency inversely relates to how important those words are in the match score.
- US Address: Contains frequent words in a US Address field. The frequency inversely relates to how important those words are in the match score.
- US Company: Contains frequent words in a Company Name field. The frequency inversely relates to how important those words are in the match score.
- Nickname/Abbreviation Table (Word Match Only): Use a common Nickname table to check against and further identify duplicates. Use this option on fields containing either only the first name or both the first and last names.
Add additional nicknames and abbreviations:
- Update the Common Nicknames.yxdb database found at Program FilesAlteryxbinRuntimeDataFuzzyMatchNicknames
- Any .yxdb files placed in this directory will become available from the drop down box in the Nicknames section of the Fuzzy Match tool.
Match 'Albert Commette' to 'Albert Commette MD.'
The Word Frequency Statistics table for 'Name' includes the word 'MD.' When Word Frequency: Name is specified, the resulting match score is roughly 5 points higher than if Word Frequency: Name is not specified.
Word Frequency Statistics are contained within Alteryx Database files *yxdb and can be located in the RunTime Data Directory:
Program FilesAlteryxbinRuntimeDataFuzzyMatch
You can also create your own Word Frequency Statistics by editing the workflow CollectStats.yxmd located in the same directory.
- Penalty: Set the penalty percentage applied when a match is made with data from the Nickname table. The default value is 15%. A penalty is recommended as a nickname match is another potential source of error. The penalty percent will be subtracted from the match score prior to comparison with the match threshold.
Alteryx Download Manager
Match Threshold: Set the allowable uncertainty percentage to return a match for a particular field.
Alteryx Download Mac Installer
Match Weight: Apply importance to the field, causing the field to be considered more or less strongly during a match.
For additional information regarding Fuzzy Match use, see the Fuzzy Match FAQ.