Term Definition Contribute
abstract syntax tree A deeply nested data structure, or tree, that represents the structure of a program. For example, the AST might have a node representing a while loop with one child representing the loop condition and another representing the loop body.
affordance Fields you can interact with such as a handle or button.
Anaconda Anaconda is a [software distribution] (#software_distribution) of R and Python. It is also a [repository] (#repository) of open-source Python and R programs for data science, packaged using the conda [package manager] (#package_manager). Anaconda also creates Anaconda Navigator, a suite of desktop tools including an [IDE] (#ide) and the Jupyter Notebook application.
append mode To add data to the end of an existing file instead of overwriting the previous contents of that file. Overwriting is the default, so most programming languages require programs to be explicit about wanting to append instead.
artificial intelligence (AI) Intelligence demonstrated by machines, as opposed to humans or other animals. AI can be exhibited through perceiving, synthesizing and inference of information. Example tasks include natural language processing, computer vision, and [machine learning] (#machine_learning).
assertion A Boolean expression that must be true at a certain point in a program. Assertions may be built into the language (e.g., Python’s assert statement) or provided as functions (e.g., R’s stopifnot). They are often used in testing, but are also put in production code to check that it is behaving correctly. In many languages, assertions should not be used to perform data validation as they may be silently dropped by compilers and interpreters under optimization conditions. Using assertions for data validation can therefore introduce security risks. Unlike many languages, R does not have an assert statement which can be disabled, and so use of a package such as assertr for data validation does not create security holes.
associative array See dictionary.
automatic variable A variable that is automatically given a value in a build rule. For example, Make automatically assigns the name of a rule’s target to the automatic variable $@. Automatic variables are frequently used when writing pattern rules.
Bayes' Rule See Bayes’ Theorem.
binary expression An expression which has two parameters and takes two arguments, such as 1 + 2.
binary large object Data that is stored in a database without being interpreted in any way, such as an audio file. The term is also now used to refer to data transferred over a network or stored in a version control repository as uninterpreted bits.
binomial distribution A probability distribution that arises when there are a fixed number of trials, each of which can produce one of two outcomes, and the probability of those outcomes does not change. As the number of trials increases, the binomial distribution approximates a normal distribution.
block comment A comment that spans multiple lines. Block comments may be marked with special start and end symbols, like /* and */ in C and its descendents, or each line may be prefixed with a marker like #.
boilerplate Standard text that is included in legal contracts, licenses, and so on. Also parts of the source code that have to be repeated very often to get basic functionality. In object-oriented programs these segments are used to encapsulate variables of objects. Some languages require a lot of these statements and are termed boilerplate languages, e.g. Java, which means that they are frequently generated automatically or by using auto-completion.
branch See Git branch.
branch-per-feature workflow A common strategy for managing work with Git and other version control systems in which a separate branch is created for work on each new feature or each bug fix and merged when that work is completed. This isolates changes from one another until they are completed.
breadth first To go through a nested data structure such as a tree by exploring all of one level, then going on to the next level and so on, or to explore a problem by examining the first step of each possible solution, and then trying the next step for each.
browser cache A place where web browsers keep copies of previously retrieved files (web pages, data files) in order to save time when they’re requested again. Sometimes, issues may arise if there is a newer version of the file online, but the browser doesn’t notice it.
bug report A collection of files, logs, or related information that describes either an unexpected output of some code or program, or an unexpected error or warning. This information is used to help find and fix a bug in the program or code.
bug tracker A system that tracks and manages reported bugs for a software program, to make it easier to address and fix the bugs.
build manager A program that keeps track of how files depend on one another and runs commands to update any files that are out-of-date. Build managers were invented to compile only those parts of programs that had changed, but are now often used to implement workflows in which plots depend on results files, which in turn depend on raw data files or configuration files.
build recipe The part of a build rule that describes how to update something that has fallen out-of-date.
build rule A specification for a build manager that describes how some files depend on others and what to do if those files are out-of-date.
build target The file(s) that a build rule will update if they are out-of-date compared to their dependencies.
byte code A set of instructions designed to be executed efficiently by an interpreter.
call stack A data structure that stores information about the subroutines currently being executed.
callback function A function A that is passed to another function B so that B can call it at some later point. Callbacks can be used synchronously, as in generic functions like map that invoke a callback function once for each element in a collection, or asynchronously, as in a client that runs a callback when a response is received in answer to a request.
camel case A style of writing code that involves naming variables and objects with no space, underscore (_), dot (.), or dash (-) characters, with each word being capitalized. Examples include CalculateSum and findPattern.
Cascading Style Sheets A way to control the appearance of HTML. CSS is typically used to specify fonts, colors, and layout.
catch (an exception) To accept responsibility for handling an error or other unexpected event. R prefers “handling a condition” to “catching an exception.” Python, on the other hand, encourages raising and catching exceptions, and in some situations, requires it.
causation A relationship between distinct events, where it is asserted that one event is responsible for producing or affecting change in the other.
CC-BY The Creative Commons - Attribution license, which requires people to give credit to the author of a work, but imposes no other restrictions.
Central Processing Unit The principal hardware of any digital computer. The CPU constitutes the essential electronic circuitry that interprets and executes instructions from the software or other hardware. Also called a central processor, main processor, or microprocessor.
centroid The center or anchor of a group created by a clustering algorithm.
chi-square test A statistical method for estimating whether two variables in a cross tabulation are correlated. A chi-square distribution varies from a normal distribution based on the degrees of freedom used to calculate it.
child (in a tree) A node in a tree that is below another node (call the parent).
class imbalance Class imbalance refers to the problem in machine learning where there is an unequal distribution of classes in the dataset.
client Typically, a program such as a web browser that gets data from a server and displays it to, or interacts with, users. The term is used more generally to refer to any program A that makes requests of another program B. A single program can be both a client and a server.
closure A set of variables defined in the same scope whose existence has been preserved after that scope has ended.
clustering The process of dividing data into groups when the groups themselves are not known in advance.
code coverage (in testing) How much of a library or program is executed when tests run. This is normally reported as a percentage of lines of code: for example, if 40 out of 50 lines in a file are run during testing, those tests have 80% code coverage.
coercion See type coercion.
cognitive load The amount of working memory needed to accomplish a set of simultaneous tasks.
collection An abstract data type that groups an arbitrary, variable number of data items (possibly zero), to allow processing them in a uniform fashion. Common examples of collections are lists, variable-size arrays and sets. Fixed-size arrays are usually not considered collections.
comma-separated values A text format for tabular data in which each record is one row and fields are separated by commas. There are many minor variations, particularly around quoting of strings.
command An instruction telling a computer program to perform a specific task.
command history An automatically created list of previously executed commands. Most read-eval-print loops (REPLs), including the Unix shell, record history and allow users to play back recent commands.
command-line argument A filename or control flag given to a command-line program when it is run.
command-line interface A user interface that relies solely on text for commands and output, typically running in a shell.
comment Text written in a script that is not treated as code to be run, but rather as text that describes what the code is doing. These are usually short notes, often beginning with a # (in many programming languages).
commit As a verb, the act of saving a set of changes to a database or version control repository. As a noun, the changes saved.
commit message A comment attached to a commit that explains what was done and why.
compile To translate textual source into another form. Programs in compiled languages are translated into machine instructions for a computer to run, and Markdown is usually translated into HTML for display.
compiled language Originally, a language such as C or Fortran that is translated into machine instructions for execution. Languages such as Java are also compiled before execution, but into byte code instead of machine instructions, while interpreted languages like Python are compiled to byte code on the fly.
compiler An application that translates programs written in some languages into machine instructions or byte code.
Comprehensive R Archive Network A public repository of R packages.
computational notebook A combination of a document format that allows users to mix prose and code in a single file, and an application that executes that code interactively and in place. The Jupyter Notebook and R Markdown files are both examples of computational notebooks.
compute shader A general purpose shader program for use in parallel processing. Often used for machine learning, simulations, and other fields which benifit from parallel computation.
conda A [package manager] (#package_manager) and environment management system, particularly popular for Python programs.
condition An error or other unexpected event that disrupts the normal flow of control.
conditional expression A ternary expression that serves the role of an if/else statement. For example, C and similar languages use the syntax test : ifTrue ? ifFalse to mean “choose the value ifTrue if test is true, or the value ifFalse if it is not.”
confidence interval A range around an estimate that indicates the margin of error, combined with a probability that the actual value falls in that range.
configuration file A file that specifies the parameters and initial settings of a software program. Configuration, or config, files are often used for information subject to changes, such as environment-specific settings.
confusion matrix A NxN matrix that describes the performance of a classification model, where N is the number of classes or outputs. Each row in the matrix represents the instances of actual classes and each column represents the predicted classes. For a binary classification model the confusion matrix gives the True Positives (TP), False Negatives (FN), False Positives (FP) and True Negatives (TN) in the 1st, 2nd, 3rd and 4th quadrants, respectively. The table can be used to calculate Accuracy, Sensitivity and Specificity amongst other measures of the model.
console A computer terminal where a user may enter commands, or a program, such as a shell that simulates such a device.
continuation prompt A prompt that indicates that the command currently being typed is not yet complete, and will not be run until it is.
continuous integration A software development practice in which changes are automatically merged as soon as they become available.
continuous random variable A variable whose value can be any real value, either within a range, or unbounded, such as age or distance.
control flow The logical flow through a program’s code. May be linear (i.e. just a series of commands), but may also include loops or conditional execution (i.e. if a condition is met).
convolutional neural network (cnn) A class of artificial neural network that is primarily used to analyze images. A CNN has layers that perform convolutions, where a filter is shifted over the data, instead of the general matrix multiplications that we see in fully connected neural network layers.
copy-on-modify The practice of creating a new copy of aliased data whenever there is an attempt to modify it so that each reference will believe theirs is the only one.
correlation coefficient A measure of how well-correlated two variables are. If the correlation coefficient between X and Y is 1.0, knowing X allows perfect prediction of Y. If the correlation coefficient is 0.0, knowing X tells you nothing about Y, and if it is -1.0, then X predicts Y, but a change in X causes an opposite change in Y.
covariance How well two variables agree with each other. The correlation coefficient is a normalized measure of covariance.
Creative Commons license A set of licenses that can be applied to published work. Each license is formed by concatenating one or more of -BY (Attribution): users must cite the original source; -SA (ShareAlike): users must share their own work under a similar license; -NC (NonCommercial): work may not be used for commercial purposes without the creator’s permission; -ND (NoDerivatives): no derivative works (e.g., translations) can be created without the creator’s permission. Thus, CC-BY-NC means “users must give attribution and cannot use commercially without permission.” The term CC-0 (zero, not letter ‘O’) is sometimes used to mean “no restrictions,” i.e., the work is in the public domain.
cross join A join that produces all possible combinations of rows from two tables.
cross-validation A technique that divides data into training data and test data. The training data and correct answers are used to find parameters, and the algorithm’s effectiveness is then measured by examining the answers it gives on the test data.
cryptographic hash function A hash function that produces a random value for any input.
current working directory The folder or directory location in which the program operates. Any action taken by the program occurs relative to this directory.
data collison Occurs when when two or more devices or nodes try to transmit signals at the same time on the same network. Similarly a data collision can also occur when hashing if two distinct pieces of data have the same hash value.
data engineer Someone who sets up and runs data analyses. Data engineers are often responsible for installing software, managing databases, generating reports, and archiving results.
data engineering The pragmatic steps taken to make data usable, such as writing short programs to put mailing addresses in a uniform format.
data masking Altering data in a dataset to anonymize or otherwise remove identifying information. For example, real names might be swapped out with fictional names to provide name-based data without referencing actual people.
data mining The use of computers to search for patterns in large datasets. The term data science is now more commonly used.
data package A software package that, mostly, contains only data. Is used to make it simpler to disseminate data for easier use.
data structure A format for the organisation, management, and efficient access of data. Typically it will characterise a set of data values and their representation (or encoding), the relationships between values, and ways to access or manipulate those data, such as reading, altering, or writing.
data visualization The creation of charts, maps, graphs, or infographics to translate datasets into something visual. Sometimes called “dataviz” or “data viz.”
data wrangling A colloquial name for small-scale data engineering.
debug In a computer environment ‘debug’ refers to the process of finding and resolving errors (also known as ‘bugs’) within computer programs or systems.
decision tree A tree whose nodes are questions and whose branches eventually lead to a decision or classification.
Decorator pattern A design pattern in which a function adds additional features to another function or a class after its initial definition. Decorators are a feature of Python and can be implemented in most other languages as well.
default target The build target that is used when none is specified explicitly.
default value A value assigned to a function parameter when the caller does not specify a value. Default values are specified as part of the function’s definition.
defensive programming A set of programming practices that assumes mistakes will happen and either reports or corrects them, such as inserting assertions to report situations that are not ever supposed to occur.
degrees of freedom In statistics, the degrees of freedom (often “DF”) is a measure of how much independent information, in the form data and calculations, has been combined to produce a given statistical parameter. Put another way, the DF is the number of values that are free to vary in the calculation of a given statistical parameter. For a statistic calculated from data which are indepdendent (i.e., the values are uncorrelated), the DF can be generally estimated as the sample size minus the number of individual parameters calculated to obtain the final statistic.
Delegate pattern A design pattern in which an object does most of the work to complete a task, but uses one of a set of other objects to complete some specific parts of the work. Delegation is often used instead of inheritance to customize objects’ behavior.
dependency See prerequisite.
dependent variable A variable whose value depends on the value of another variable, which is called the independent variable.
depth first To go through a nested data structure such as a tree by going as far as possible down one path, then as far as possible down the next and so on, or to explore a problem by following one solution to its conclusion and then trying the next.
design pattern A recurring pattern in software design that is specific enough to be worth naming, but not so specific that a single best implementation can be provided by a library. For example, data frames and database tables are instances of the same pattern.
destructuring assignment Unpacking values from data structures and assigning them to multiple variables in a single statement.
Digital Object Identifier A unique persistent identifier for a book, paper, report, dataset, software release, or other digital artefact.
dimension reduction Reducing the number of dimensions in a dataset, typically by finding the dimensions along which it varies most.
discrete random variable A variable whose value can take on only one of a fixed set of values, such as true or false.
distro See software distribution.
docstring Short for “documentation string,” a string appearing at the start of a module, class, or function in Python that automatically becomes that object’s documentation.
Document Object Model A standard, in-memory representation of HTML and XML. Each element is stored as a node in a tree with a set of named attributes; contained elements are child nodes. Modern programming languages provide many libraries for searching and modifying the DOM.
documentation generator A software tool that extracts specially formatted comments or dostrings from code and generates cross-referenced developer documentation.
DOM selector A pattern that identifies nodes in a DOM tree. For example, #alpha matches nodes whose id attribute is “alpha”, while .beta matches nodes whose class attribute is “beta”.
domain knowledge Understanding of a specific problem domain, e.g., knowledge of transportation logistics.
Don't Repeat Yourself principle The Don’t Repeat Yourself (DRY) principle states that — Every piece of knowledge must have a single, unambiguous, authoritative representation within a system. The term comes from The Pragmatic Programmer, by Andrew Hunt and David Thomas. Programs that follow the DRY Principle avoid duplication of definitions and logic, so that a change in their behaviour only requires each modification to be made in one part of the code. The goal is to create code that is easier to maintain.
double Short for “double-precision floating-point number”, meaning a 64-bit numeric value with a fractional part and an exponent.
double square brackets An index enclosed in [[...]], used to return a single value of the underlying type.
down-vote A vote against something.
dynamic loading To import a module into the memory of a program while it is already running. Most interpreted languages use dynamic loading, and provide tools so that programs can find and load modules dynamically to configure themselves.
dynamic lookup To find a function or a property of an object by name while a program is running. For example, instead of getting a specific property of an object using obj.name, a program might use obj[someVariable], where someVariable could hold "name" or some other property name.
dynamic scoping To find the value of a variable by looking at what is on the call stack at the moment the lookup is done. Almost all programming languages use lexical_scoping instead, since it is more predictable.
edge A connection between two nodes in a graph. An edge may have data associated with it, such as a name or distance.
Electronic mail Electronic mail is a method for delivering messages between people over a computer network. Messages are sent via an SMTP server and retrieved using either an IMAP or POP server.
element A named component in an HTML or XML document. Elements are usually written <name></name>, where “…” represents the content of the element. Elements often have attributes.
Emacs (editor) A text editor that is popular among Unix programmers.
empty element An element of an HTML or XML document that has no children. Empty elements can always be written as <name></name>, but may also be written using the shorthand notation <name/> (with a slash after the name inside the angle brackets).
empty vector A vector that contains no elements. Empty vectors have a type such as logical or character, and are not the same as null.
encoding The process of putting a sequence of characters such as letters, numbers, punctuation, and certain symbols, into a specialized format for efficient transmission or storage.
environment A structure that stores a set of variable names and the values they refer to.
error (in a test) Signalled when something goes wrong in a unit test itself rather than in the system being tested. In this case, we do not know anything about the correctness of the system.
error handling What a program does to detect and correct for errors. Examples include printing a message and using a default configuration if the user-specified configuration cannot be found.
escape sequence A sequence of characters added as a prefix to some other character that would otherwise have a special meaning, temporarily altering the meaning of the character. For example, the escape sequence \" is used to represent a double-quote character inside a double-quoted string.
evaluation The process of taking an expression such as 1+2*3/4 and turning it into a single, irreducible value.
exception An object that stores information about an error or other unusual event in a program. One part of a program will create and raise an exception to signal that something unexpected has happened; another part will catch it.
exception handler A piece of code that deals with an exception after it is caught, e.g., by writing a log message, retrying the operation that failed, or performing an alternate operation.
export To make something visible outside a module so that other parts of a program can import it. In most languages a module must export things explicitly in an attempt to avoid name collision.
false The logical (Boolean) state opposite of “true”. Used in logic and programming to represent a binary state of something.
false negative Data points which are actually true but incorrectly predicted as false.
false positive Data points which are actually false but incorrectly predicted as true.
falsy Evaluating to false in a Boolean context.
FASTQ A file format for storing genomic sequence information and the corresponding quality scores. Information for each sequence is broken up into a block of four lines. Line 1 contains information about the sequence and begins with ‘@’. Line 2 contains the actual genomic sequence using single-letter codes to represent nucleotides. Line 3 is a separator that begins with a +. Line 4 has a string of quality characters for each base in the genomic sequence.
Feature An individual characteristic or property of a phenomenon that is measurable (e.g. length, height, number of petals) and used as the input to a model. Finding or selecting features that are highly independent and discriminatory is a fundamental part of classification.
feature (in data) A variable or observable in a dataset.
feature (in software) Some aspect of software that was deliberately designed or built. A bug is an undesired feature.
feature branch A branch within a Git repository containing commits dedicated to a specific feature, e.g., a bug fix or a new function. This branch can be merged into another branch.
feature engineering The process of choosing the variables to be used as inputs to a model. Choosing good features often depends on domain knowledge.
feature request A request to the maintainers or developers of a software program to add a specific functionality (a feature) to that program.
field A component of a record containing a single value. Every record in a tibble or database table has the same fields.
filename extension The last part of a filename, usually following the ‘.’ symbol. Filename extensions are commonly used to indicate the type of content in the file, though there is no guarantee that this is correct.
filename stem The part of the filename that does not include the extension. For example, the stem of glossary.yml is glossary.
filter As a verb, to choose a set of records (i.e., rows of a table) based on the values they contain. As a noun, a command-line program that reads lines of text from files or standard input, performs some operation on them (such as filtering), and writes to a file or stdout.
fixture The thing on which a test is run, such as the parameters to the function being tested or the file being processed.
fork A copy of one person’s Git repository that lives in another person’s GitHub account. Changes to the content of a fork can be submitted to the upstream repository via a pull request.
fragment shader The shader stage in the rendering pipeline designated towards calculating colours for each fragment on the screen. For each pixel covered by a primitive, a fragment is generated. All fragments for each pixel will have their colours combined based on depth and opacity after the fragment shader stage is complete.
Frequently Asked Questions A curated list of questions commonly asked about a subject, along with answers.
full identifier (of a commit) A unique 160-bit identifier for a commit in a Git repository, usually written as a 20-character hexadecimal character string.
full join A join that returns all rows and all columns from two tables A and B. Where the keys of A and B match, values are combined; where they do not, missing values from either table are filled with null, NA, or some other missing value signifier.
fully-qualified name An unambiguous name of the form package::thing, indicating the original source of the object in question.
function A code block which gathers a sequence of operations into a whole, preserving it for ongoing use by defining a set of tasks that takes zero or more required and optional arguments as inputs and returns expected outputs (return values), if any. Functions enable repeating these defined tasks with one command, known as a function call.
functional programming A style of programming in which data is transformed through successive application of functions, rather than by using control structures such as loops. In functional programming, there must be a direct relationship between the input to a function and the output produced by the function, meaning the result should not be affected by the current values of global variables or other parts of the global program state. It also requires that functions do not produce side effects, meaning they do not modify the global program state, or do anything other than computing the return value, such as writing output to a log file, or printing to the console.
garbage in, garbage out The idea that messy data as an input will result in messy data as an output.
generator function A function whose state is automatically saved when it returns a value so that execution can be restarted from that point the next time it is called. One example of generator functions use is to produce streams of values that can be processed by for loops.
Geometric Mean Calculated from a set of n numbers by first computing the product of those numbers, and then computing the n-th root of the result. In contrast to the arithmetic mean, which measures central tendancy in an “additive” way, the geometric mean measures central tendancy in a “multiplicative” way, and hence is often appropriate when estimating an average rates of change or some other multiplicative constant.
geometry shader The shader stage in the rendering pipeline designated towards processing primitives. Not to be confused with tessellation shaders, geometry shaders are focused on modifying the shape of primitives to create new results. For example, pixels may be converted into particles using a geometry shader.
ggplot2 A package in R that implements a layered grammar of graphics for generating plots. It is a popular alternative to plotting with base R and part of the tidyverse.
Git A version control tool to record and manage changes to a project.
Git branch A snapshot of a version of a Git repository. Multiple branches can capture multiple versions of the same repository.
Git clone Copies (and usually downloads) of a Git remote repository on a local computer.
Git conflict A situation in which incompatible or overlapping changes have been made on different branches that are now being merged.
Git fork To make a new copy of a Git repository on a server, or the copy that is made.
Git merge Merging branches in Git incorporates development histories of two branches in one. If changes are made to similar parts of the branches on both branches, a conflict will occur and
Git pull Downloads and synchronizes changes between a remote repository and a local repository.
Git push Uploads and synchronizes changes between a local repository and a remote repository.
Git remote A short name for a remote repository (like a bookmark).
GitHub A cloud-based platform built around Git that allows you to save versions of your project online and collaborate with other Git users.
global environment The environment that holds top-level definitions in a programming language, e.g., those written directly in the interpreter.
globbing To specify a set of filenames using a simplified form of regular expressions, such as *.dat to mean “all files whose names end in .dat”. The name is derived from “global”.
GNU Operating System “GNU” is an operating system that is free software. GNU is a recursive acronym for “GNU is Not Unix!”. The GNU operating system consists of GNU packages as well as free software realeased by third parties.
GNU Public License A license that allows people to re-use software as long as they distribute the source of their changes.
gradient boosting A machine learning technique that produces an ensemble of weak prediction models (typically decision trees) in a stepwise fashion.
graph
  1. A plot or a chart that displays data, or 2. a data structure in which nodes are connected to one another by edges.
graphical user interface A user interface that relies on windows, menus, pointers, and other graphical elements, as opposed to a command-line interface or voice-driven interface.
Graphics Processing Unit Specialized processor designed to run many instances of a single program in parallel. Orginally designed for use in graphics, but is also used for general computation in the form of compute shaders.
group To divide data into subsets according to a set of criteria while leaving records in a single structure.
handle (condition) To accept responsibility for handling an error or other unexpected event. R prefers “handling a condition” to “catching an exception”. Python, on the other hand, encourages raising and catching exceptions, and in some situations, requires it.
hardware Any physical component of a computer system. Hardware can be internal, such as CPU, memory, and graphics cards; or external, such as monitors and keyboards. Hardware operates in conjunction with software to produce a functioning computer system.
Harmonic Mean Calculated from a set of n numbers by first computing the sum of the reciprocals of those numbers, and then dividing n by the resulting sum. Alternatively, it can be computed as the reciprocal of the arithmetic mean of the reciprocal values. Similarly to the geometric mean, the harmonic mean is often used as an alternative measure of central tendancy to the usual arithmetic mean when estimating an average rates of change or some other multiplicative constant. For a set of positive numbers that are not all equal, the min < HM < GM < AM < max where min is the minimum value, max is the maximum value, and HM GM and AM are the harmonic, geometric, and arithmetic means respectively.
hash function A function that turns arbitrary data into a bit array, or a key, of a fixed size. Hash functions are used to determine where data should be stored in a hash table.
hash table A data structure that calculates a pseudo-random key (location) for each value passed to it and stores the value in that location. Hash tables enable fast lookup for arbitrary data. This occurs at the cost of extra memory because hash tables must always be larger than the amount of information they need to store, to avoid the possibility of data collisions, when the hash function returns the same key for two different values.
header row If present, the first row of a data file that defines column names (but not their data types or units).
heterogeneous Containing mixed data types. For example, in Python and R, a list can contain a mix of numbers, character strings, and values of other types.
hexadecimal A base-16 number system. Hexadecimal values are usually written using the digits 0-9 and the characters A-F in either upper or lower case. Hexadecimal is often used to represent binary values, since two hexadecimal digits use exactly one byte of storage.
higher-order function A function that operates on other functions. For example, the higher-order function map executes a given function once on each value in a list. Higher-order functions are heavily used in functional programming.
Hippocratic License An ethical software license that allows free use for any purpose that does not contravene the Universal Declaration of Human Rights.
histogram A graphical representation of the distribution of a set of numeric data, usually a vertical bar graph.
hitchhiker Someone who is part of a project but does not actually do any work on it.
home directory A directory that contains a user’s files. Each user on a multi-user computer will have their own home directory; a personal computer will often only have one home directory.
homogeneous Containing a single data type. For example, a vector must be homogeneous: its values must all be numeric, logical, etc.
HTTP header A key-value pair at the top of an HTTP request or response that carries additional information such as the user’s preferred language or the length of the data being transferred.
HTTP request A message sent from a client to a server using the HTTP protocol asking for data. A request usually asks for a web page, image, or other data.
HTTP response A reply sent from a server to a client using the HTTP protocol in response to a request. The response usually contains a web page, image, or data.
HyperText Markup Language The standard markup language used for web pages. HTML is represented in memory using DOM (Document Object Model).
HyperText Transfer Protocol The standard protocol for data transfer on the World-Wide Web. HTTP defines the format of requests and responses, the meanings of standard error codes, and other features.
icon In computing, an icon is a graphic symbol that is displayed on a computer screen to help a user navigate the computer system.
immutable type Immutable is when no change is possible over time. An object of this type can not be changed and its state can not be modified after it is created.
impostor syndrome The false belief that one’s successes are a result of accident or luck rather than ability.
in-place operator An operator that updates one of its operands. For example, the expression x += 2 uses the in-place operator += to add 2 to the current value of x and assign the result back to x.
index Each of the elements of an array. Indexes represent the position by numerical representation.
infinite loop A loop where the exit condition is never met, so the loop continues to repeat itself. Often a programming error.
inner join A join that returns the combination of rows from two tables, A and B, whose keys exist in both tables.
instance An object of a particular class.
integration test A test that checks whether the parts of a system work properly when put together.
interface A ubiquitously used phrase in computing that describes a point of contact. This could be a user interface (e.g. graphical user interface or command line), the interface of an object with the rest of the code or how a program can interact with web services through an API.
Internet Message Access Protocol A standard internet protocol used by email clients to retrieve messages from an email server. Messages are left on the server so that they can be accessed from multiple email clients.
interpreted language A high-level language that is not executed directly by the computer, but instead is run by an interpreter that translates program instructions into machine commands on the fly.
interpreter A program whose job it is to run programs written in a high-level interpreted language. Interpreters can run interactively, but may also execute commands saved in a file.
invariant Something that must be true at all times inside of a program or during the lifecycle of an object. Invariants are often expressed using assertions. If an invariant expression is not true, this is indicative of a problem, and may result in failure or early termination of the program.
IPython Short for “Interactive Python”, it is a console designed to assist interactive and exploratory programming with features such as coloured text, tab-completion, filesystem navigation, quick access to documentation and shell commands.
ISO date format An international for formatting dates. While the full standard is complex, the most common form is YYYY-MM-DD, i.e., a four-digit year, a two-digit month, and a two-digit day, separated by hyphens.
issue A bug report, feature request, or other to-do item associated with a project. Also called a ticket.
issue tracking system Similar to a bug tracking system in that it tracks issues made to a repository, usually in the form of feature requests, bug reports, or some other to-do item.
Iterator pattern A design pattern in which a temporary object or generator function produces each value from a collection in turn for processing. This pattern hides the differences between different kinds of data structures so that everything can be processed using loops.
Java Java is a high-level, cross-platform, object-oriented and general-purpose programming language. Programs written in Java will run on any platform that supports the Java software platform without having to be recompiled. This feature gave rise to the slogan “Write Once Run Anywhere”. Java syntax is similar to that of C and C++.
JavaScript Object Notation A way to represent data by combining basic values like numbers and character strings in lists and key/value structures. The acronym stands for “JavaScript Object Notation”; unlike better-defined standards like XML, it is unencumbered by a syntax for comments or ways to define a schema.
join One of several operations that combine values from two tables.
Jupyter Project Jupyter is a non-profit, open-source project that was born out of the IPython Project in 2014 as IPython evolved to support interactive data science and scientific computing in many different programming languages.
Jupyter Notebook An open-source, web-based computational notebook that allows the user to write and share live code, equations, visualisations, and narrative text.
JupyterLab A next-generation interface to Jupyter Notebooks. JupyterLab is open-source, web-based and has a multiple-document interface which supports working with multiple notebooks and Markdown files in a single browser tab. JupyterLab also supports opening terminal/console windows in the browser.
k-means clustering An unsupervised_learning algorithm that forms k groups by repeatedly calculating the centroid of the current groups and then reallocating data points to the nearest centroid until the centroids no longer move.
k-nearest neighbors A classification algorithm that classifies data points based on their similarity to their k nearest neighbours.
kebab case A naming convention in which the parts of a name are separated with dashes, as in first-second-third.
key
  1. A field or combination of fields whose value(s) uniquely identify a record within a table or dataset. Keys are often used to select specific records and in joins.
  2. Part of a key/value pair, used as a unique identifier in a data structure such as a dictionary.
keyword arguments Extra (often optional) arguments given to a function as key/value pairs.
label (an issue) A short textual tag associated with an issue to categorize it. Common labels include bug and feature request.
latent variable A variable that is not observed directly but instead is inferred from the states or values of other variables.
LaTeX A typesetting system for document preparation that uses a specialized markup language to define a document structure (e.g., headings), stylise text, insert mathematical equations, and manage citations and cross-references. LaTeX is widely used in academia, in particular for scientific papers and theses in mathematics, physics, engineering, and computer science.
lazy evaluation Delaying evaluation of an expression until the value is actually needed or, in the case of a conditional expression, only evaluating as much of the expression as is necessary. For instance, the second half of A and B will only be evaluated if A is truthy.
left join A join that combines data from two tables, A and B, where keys in table A match keys in table B, fields are concatenated. Where a key in table A does not match a key in table B, columns from table B are filled with null, NA, or some other missing value. Keys from table B that do not match keys from table A are excluded for the result.
lexical scoping To look up the value associated with a name according to the textual structure of a program. Most programming languages use lexical scoping instead of dynamic scoping because the latter is less predictable.
license A legal document describing how something can be used, and by whom.
lifecycle The steps that something is allowed or required to go through. The lifecycle of an object runs from its construction through the operations it can or must perform before it is destroyed; the lifecycle of an issue may be: “created”, “assigned”, “in progress”, “ready for review”, and “completed”.
lift How well a model predicts or classifies things, measured as the ratio of the response in the segment identified to the response in the population as a whole. A lift of 1 means the model does no better than chance; higher lift means the model is doing better.
line comment A comment in a program that spans part of a single line, as opposed to a block comment that may span multiple lines.
linear regression A method for finding the best straight-line fit between two datasets, typically by minimizing the squares of the distances between the points and a regression line.
linter A program that checks for common problems in software, such as violations of indentation rules or variable naming conventions. The name comes from the first tool of its kind, called lint.
Lisp A family of programming languages that represent programs and data as nested lists. Many other programming languages have borrowed ideas from Lisp.
list A vector that can contain values of many different (heterogeneous) types.
list comprehension In Python, an expression that creates a new list in place. For example, [2*x for x in values] creates a new list whose items are the doubles of those in values.
log A record of a program’s execution containing messages written via a logging framework for later inspection.
log message A single entry in a log of a program’s execution. Log messages are usually highly structured so that data (such as the time or the severity) can be recovered later.
logging framework A software library that manages internal reporting for programs.
logging level A setting that controls how much information is generated by a logging framework. Typical logging levels include DEBUG, WARNING, and ERROR.
logical indexing To index a vector or other structure with a vector of Booleans, keeping only the values that correspond to true values. Also referred to as masking.
logistic regression A method for fitting a model to some data that uses logistic (S-shaped) curves instead of straight lines.
long identifier (of commit) See full identifier.
long option A full-word identifier for a command-line argument. While most common flags are a single letter preceded by a dash, such as -v, long options typically use two dashes and a readable name, such as --verbose.
loop body The statement or statements executed by a loop.
machine learning The study or use of algorithms whose performance improves as they are given more data. Machine learning algorithms often use training data to build a model. Their performance is then measured by how well they predict the properties of test data.
magic number An unnamed numerical constant that appears in a program without explanation.
Make The original build manager for Unix, still in daily use after more than forty years.
Makefile A file containing commands for Make, often actually called Makefile.
Markdown A markup language with a simple syntax intended as a replacement for HTML. Markdown is often used for README files, and is the basis for R markdown.
Markov Chain Any model describing a series of events in which the probability of each event depends only on the current state, not on the path taken to reach that state.
markup language A set of rules for annotating text to define its meaning or how it should be displayed. The markup is usually not displayed, but instead controls how the underlying text is interpreted or shown. Markdown and HTML are widely-used markup languages for web pages.
Martha's Rules A simple set of rules for making decisions in small groups.
Masking [TODO] to be defined
master branch A dedicated, permanent, central branch which should contain a “ready product”. After a new feature is developed on a separate branch to avoid breaking the main code, it can be merged into the master branch.
maximum likelihood estimation To choose the parameters for a probability distribution in order to maximize the likelihood of obtaining observed data.
mean The average value of a dataset, more properly known as the arithmetic mean to distinguish it from the geometric and harmonic means.
merge (Git) See Git merge
milestone A target that a project is trying to meet, often represented as a set of issues that all have to be resolved by a certain time.
MIME type A standard way to identify the contents of files on the internet. The term is an acronym of “multi-purpose Internet mail extension”, and MIME types are often identified by filename extensions, such as .png for PNG-formatted images.
minimum spanning tree A minimum spanning tree is a data structure that describes the unique set of edges that connect all of the nodes in a graph while minimizing the weights of all included edges. The minimum spanning tree may refer to either the algorithm to calculate the structure or the resulting structure itself.
missing value A special value such as null or NA used to indicate the absence of data. Missing values can signal that data was not collected or that the data did not exist in the first place (e.g., the middle name of someone who does not have one).
MIT License A license that allows people to re-use software with no restrictions.
mock object A simplified replacement for part of a program whose behavior is easy to control and predict. Mock objects are used in unit tests to simulate databases, web services, and other complex systems.
model A specification of the mathematical relationship between different variables.
Monte Carlo method Any method or algorithm that relies on artificially-injected randomness.
moving average The mean of each set of several consecutive values from time series data.
multi-threaded Capable of performing several operations simultaneously. Multi-threaded programs are usually more efficient than single-threaded ones, but also harder to understand and debug.
mutable type An object of this type may be changed and its state can be modified after it is created.
mutation Changing data in place, such as modifying an element of an array or adding a record to a database.
n-gram A sequence of $N$ items, typically words in natural language. For example, a trigram is a sequence of three words. N-grams are often used as input in computational linguistics.
n-th root The n-th root of a positive number x is the number that when multiplied by itself n times produces x. This can commonly be calculated by raising x to the power of the reciprocal of n.
NA A special value used to represent data that is not available.
naive Bayes classifier Any classification algorithm based on Bayes’ Theorem that assumes every feature being classified is independent of every other feature.
name collision The ambiguity that arises when two or more things in a program that have the same name are active at the same time. Most languages use namespaces to prevent such collisions.
named argument A function parameter that is given a value by explicitly naming it in a function call.
namespace A collection of names in a program that exists in isolation from other namespaces. Each function, object, class, or module in a program typically has its own namespace so that references to “X” in one part of a program do not accidentally refer to something called “X” in another part of the program. Scope is a distinct, but related, concept.
Nano (editor) A very simple text editor found on most Unix systems.
negative selection To specify the elements of a vector or other data structure that are not desired by negating their indices.
non-blocking execution To allow a program to continue running while an operation is in progress. For example, many systems support non-blocking execution for file I/O so that the program can continue doing work while it waits for data to be read from or written to the filesystem (which is typically much slower than the CPU).
non-parametric (statistics) A branch of statistical tests which do not assume a known distribution of the population which the samples were taken from (Kruskal-Wallis and Dunn test are examples of non-parametric tests).
normal distribution A continuous random distribution with a symmetric bell-curve shape. As datasets get larger, some of their most important statistical properties can be modeled using a normal distribution.
NoSQL database Any database that does not use the relational model. The name comes from the fact that such databases do not use SQL as a query language.
null A special value used to represent a missing object. Null is not the same as NA, and neither is it the same as an empty vector.
null hypothesis The claim that any patterns seen in data are entirely due to chance. Other claims (e.g., “X causes Y”) must be much more likely than the null hypothesis in order to be substantiated.
nullary expression An “expression” with no arguments, such as the value 3.
objective function A function of one or more variables used to measure or compare the goodness of different solutions in an optimization problem.
observation A value or property of a specific member of a population.
off-by-one error A common error in programming in which the program refers to element i of a structure when it should refer to element i-1 or i+1, or processes N elements when it should process N-1 or N+1.
open science A generic term for making scientific software, data, and publications generally available.
OpenRefine A standalone, open source desktop application for data cleanup and transformations, also know as data wrangling.
operating system A program that provides a standard interface to whatever hardware it is running on. Theoretically, any program that only interacts with the operating system should run on any computer that operating system runs on.
optional_parameter A parameter that does not have to be given a value when a function is called. Most programming languages require programmers to define default values for optional parameters, or assign them a special value automatically. Arguments passed to optional parameters will often be specified using keyword arguments.
ORCID An Open Researcher and Contributor ID that uniquely and persistently identifies an author of scholarly works. ORCIDs are for people what DOIs are for documents.
orthogonality The ability to use various features of software in any order or combination. Orthogonal systems tend to be easier to understand, since features can be combined without worrying about unexpected interactions.
outlier Extreme values that might be measurement or recording errors, or might actually be rare events. Outliers are sometimes ignored when doing statistics, or handled or visualized separately.
overfitting Fitting a model so closely to one dataset that it does not generalize to others.
p value The probability of obtaining a result at least as strong as the one observed if the null_hypothesis is true (i.e., if variation is purely due to chance). The lower the p-value, the more likely it is that something other than chance is having an effect.
package A collection of code, data, and documentation that can be distributed and re-used. Also referred to in some languages as a library or module.
package manager A program that does its best to keep track of the different software installed on a computer and their dependencies on one another.
pager A program that displays a few lines of text at a time.
parameter A variable specified in a function definition whose value is passed to the function when the function is called. Parameters and arguments are distinct, but related concepts. Parameters are variables and arguments are the values assigned to those variables.
parametric (statistics) A branch of statistical tests which assume a known distribution of the population which the samples were taken from (ANOVA and Student’s t-tests are examples of parametric tests).
parent (in a tree) A node in a tree that is above another node (call a child). Every node in a tree except the root node has a single parent.
parent directory The directory that contains another directory of interest. Going from a directory to its parent, then its parent, and so on eventually leads to the root directory of the filesystem.
parse To translate the text of a program or web page into a data structure in memory that the program can then manipulate.
patch A single file containing a set of changes to a set of files, separated by markers that indicate where each individual change should be applied.
path (in filesystem) A string that specifies a location in a filesystem. In Unix, the directories in a path are joined using /.
pattern rule A generic build rule that describes how to update any file whose name matches a pattern. Pattern rules often use automatic variables to represent the actual filenames.
Peanuts An American comic strip by Charles M. Schulz which has inspired the names of R versions.
phony target A build target that does not correspond to an actual file. Phony targets are often used to store commonly used commands in a Makefile.
Pip Install Packages The standard package manager for Python. pip enables the download and installation of Python packages not included in the standard library.
pipe (in the Unix shell) The | used to make the output of one command the input of the next.
pipe operator The %>% used to make the output of one function the input of the next.
pivot table A technique for summarizing tabular data in which each cell represents the sum, average, or other function of the subset of the original data identified by the cell’s row and column heading.
pointcloud A set of discrete data points in three-dimensional space.
Poisson distribution A discrete random distribution that expresses the probability of $N$ events occurring in a fixed time interval if the events occur at a constant rate, independent of the time since the last event.
positional argument An argument to a function that gets its value according to its place in the function’s definition, as opposed to a named argument that is explicitly matched by name.
Post Office Protocol A standard internet protocol used by email clients to retrieve messages from an email server. Messages are generally downloaded and deleted from the server, making it difficult to access messages from multiple email clients. POP3 (version 3) is the version of POP in common use.
posterior distribution Probability distribution summarizing the prior distribution and the likelihood function.
pothole case A naming style that separates the parts of a name with underscores, as in first_second_third.
preamble A series of commands, either placed in the main document, or kept in a separate document, that are included prior to the \begin{document} command. The preamble defines the type of the document, along with other formatting attributes and parameters. This is also the section of the document where packages are added using the command \usepackage{} to enable additional functionalities, and where custom commands can be defined.
prerequisite Something that a build target depends on.
principal component analysis An algorithm that find the axis along which data varies most, then the axis that accounts for the largest part of the remaining variation, and so on.
prior distribution The probability distribution that is assumed as a starting point when using Bayes’ Theorem and used to construct a more accurate posterior_distribution.
probability distribution A mathematical description of all possible outcomes of a random event, and the probability of each occurring.
procedural generation A method of generating data algorithmically rather than manually. Typically this is done to reduce file sizes, increase the overall amount of content, and/or incorporate randomness at the expense of processing power.
procedural programming A style of programming in which functions operate on data that is passed into them. The term is used in contrast to other programming styles, such as object-oriented programming and functional programming.
process An operating system’s representation of a running program. A process typically has some memory, the identity of the user who is running it, and a set of connections to open files.
product manager The person responsible for defining what features a product should have.
production code Software that is delivered to an end user. The term is used to distinguish such code from test code, deployment infrastructure, and everything else that programmers write along the way.
project manager The person responsible for ensuring that a project moves forward.
prompt The text printed by an REPL or shell that indicates it is ready to accept another command. The default prompt in the Unix shell is usually $, while in Python it is >>>, and in R it is >.
protocol Any standard specifying how two pieces of software interact. A network protocol such as HTTP defines the messages that clients and servers exchange on the World-Wide Web; object-oriented programs often define protocols for interactions between objects of different classes.
provenance A record of where data originally came from and what was done to process it.
pseudo-random number generator A function that can generate pseudo-random numbers.
pull indexing Vectorized indexing in which the value at location i in the index vector specifies which element of the source vector is being pulled into that location in the result vector, i.e., result[i] = source[index[i]].
pull request The request to merge a new feature or correction created on a user’s fork of a Git repository into the upstream repository. The developer will be notified of the change, review it, make or suggest changes, and potentially merge it.
push indexing Vectorized indexing in which the value at location i in the index vector specifies an element of the result vector that gets the corresponding element of the source vector, i.e., result[index[i]] = source[i]. Push indexing can easily produce gaps and collisions.
Python Package Index The official third-party software repository for Python. Anyone can upload a package to PyPI. PyPI packages may install via executed scripts or pre-compiled, system-specific wheels.
Python Software Foundation A non-profit organization that oversees and promotes the development and use of Python.
quantile If a set of sorted values are divided into groups of each size, each group is called a quantile. For example, if there are five groups, each is called a quintile; the bottom quintile contains the lowest 20% of the values, while the top quintile contains the highest 20%.
query string The portion of a URL after the question mark ? that specifies extra parameters for the HTTP request as name-value pairs.
quosure A data structure containing an unevaluated expression and its environment.
quoting function A function that is passed expressions rather than the values of those expressions.
R (programming language) A popular open-source programming language used primarily for data science.
R Consortium A group that supports the worldwide community of users, maintainers and developers of R. Its members include leading institutions and companies dedicated to the use, development, and growth of R.
R Foundation A non-profit founded by the R development core team providing support for R. It is a member of the R Consortium.
R Hub A free platform available to check an R package on several different platforms in preparation for the CRAN submission process.
raise (an exception) To signal that something unexpected or unusual has happened in a program by creating an exception and handing it to the error-handling system, which then tries to find a point in the program that will catch it.
random forests An algorithm used for regression or classification that uses a collection of decision trees, called a forest. Each tree votes for a classification, and the algorithm chooses the classification having the most votes over all the trees in the forest.
reactive programming A style of programming in which actions are triggered by external events.
reactive variable A variable whose value is automatically updated when some other value or values change. Reactive variables are used extensively in Shiny.
read-eval-print loop An interactive program that reads a command typed in by a user, executes it, prints the result, and then waits patiently for the next command. REPLs are often used to explore new ideas, or for debugging.
README A plain text file containing important information about a project or software package.
reciprocal The reciprocal of a number x is 1 / x, or alternatively x raised to the power of -1.
record A group of related values that are stored together. A record may be represented as a tuple or as a row in a table; in the latter case, every record in the table has the same fields.
recurrent neural network A class of artificial neural networks where connections between nodes can create a cycle. This allows the network to exhibit behavior that is dynamic over time. This type of network is applicable to tasks like speech and handwriting recognition.
recursion Calling a function from within a call to that function, or defining a term using a simpler version of the same term.
redirection To send a request for a web page or web service to a different page or service.
refactoring Reorganizing software without changing its behavior.
regression testing Testing software to ensure that things which used to work have not been broken.
reinforcement learning Any machine learning algorithm which is not given specific goals to meet, but instead is given feedback on whether or not it is making progress.
relative path A path whose destination is interpreted relative to some other location, such as the current working directory. A relative path is the equivalent of giving directions using terms like “straight” and “left”.
relative row number The index of a row in a displayed portion of a table, which may or may not be the same as the absolute row number within the table.
remote login Starting an interactive session on one computer from another computer, e.g., by using SSH.
remote repository A repository located on another computer. Tools such as Git are designed to synchronize changes between local and remote repositories in order to share work.
reprex A reproducible example. When asking questions about coding problems online or filing issues on GitHub, you should always include a reprex so others can reproduce your problem and help. The reprex package can help!
reproducible example See reprex.
reproducible research The practice of describing and documenting research results in such a way that another researcher or person can re-run the analysis code on the same data to obtain the same result.
research software engineer Someone whose primary responsibility is to build the specialized software that other researchers depend on.
research software engineering The practice of and methods for building the specialized software that other researchers depend on.
reserved word A word (character string) with a distinct meaning for a programming or scripting language. Typically, reserved words cannot be used as names for variables or constants, as this would confuse the compiler or interpreter.
reStructured Text A plaintext markup format used primarily in Python documentation.
revision See commit.
right join A join that combines data from two tables, A and B. Where keys in table A match keys in table B, fields are concatenated. Where a key in table B does not match a key in table A, columns from table A are filled with null, NA, or some other missing value signifier. Keys from table A that do not exist in table B are dropped.
root (in a tree) The node in a tree of which all other nodes are direct or indirect children, or equivalently the only node in the tree that has no parent.
root directory The directory that contains everything else, either directly or indirectly. The root directory is written / (a bare forward slash).
rotating file A set of files used to store recent information. For example, there might be one file with results for each day of the week, so that results from last Tuesday are overwritten this Tuesday.
S4 A framework for object-oriented programming in R.
sandbox A testing environment that is separate from the production system, or an environment that is only allowed to perform a restricted set of operations for security reasons.
sanity check A basic test to see if the outcome of a calculation, script or analysis makes sense or is true. This can be performed by visualisation or by simply inspecting the outcome.
schema A specification of the format of a dataset, including the name, format, and content of each table.
scope The portion of a program within which a definition can be seen and used. See closure, global variable, and local variable.
script Originally, a program written in a language too user-friendly for “real” programmers to take seriously; the term is now synonymous with program.
search path The list of directories that a program searches to find something. For example, the Unix shell uses the search path stored in the PATH variable when trying to find a program whose name it has been given.
select To choose entire columns or rows from a table by name or location.
self join A join that combines a table with itself.
semantic versioning A standard for identifying software releases. In the version identifier major.minor.patch, major changes when a new version of software is incompatible with old versions, minor changes when new features are added to an existing version, and patch changes when small bugs are fixed.
sense vote A preliminary vote used to determine whether further discussion is needed in a meeting.
sensitivity Statistical measure of a classification model which gives the True Positive rate. For example, the proportion of people who have a disease that test positive. Calculated as Sensitivity = TP/(TP+FN).
sequential data Any list of data items where the order is an inherent property of the list. Often the next item in the list is dependent on the previous item or items.
server Typically, a program such as a database manager or web server that provides data to a client upon request.
shader A program designed to run on the [GPU][gpu]. Generally used in graphics to calculate lighting or position vertices in a scene, though can be used for more general programming through the use of [compute shaders][#compute_shader].
shebang In Unix, a character sequence such as #!/usr/bin/python in the first line of an executable file that tells the shell what program to use to run that file.
shell A command-line interface that allows a user to interact with the operating system, such as Bash (for Unix and MacOS) or PowerShell (for Windows).
shell script A set of commands for the shell stored in a file so that they can be re-executed. A shell script is effectively a program.
shell variable A variable set and used in the Unix shell. Commonly used shell variables include HOME (the user’s home directory) and PATH (their search path).
Shiny A R package that makes it simple to build web applications to interactively visualise and manipulate data. Often used to make interactive graphs and tables straight from R without having to know HTML, CSS or JavaScript.
short circuit test A logical test that only evaluates as many arguments as it needs to. For example, if A is false, then most languages never evaluate B in the expression A and B.
short identifier (of commit) The first few characters of a full identifier. Short identifiers are easy for people to type and say aloud, and are usually unique within a repository’s recent history.
short option A single-letter identifier for a command-line argument. Most common flags are a single letter preceded by a dash, such as -v.
side effect A change made by a function while it runs that is visible after the function finishes, such as modifying a global variable or writing to a file. Side effects make programs harder for people to understand, since the effects are not necessarily clear at the point in the program where the function is called.
signal (a condition) A way of indicating that something has gone wrong in a program, or that some other unexpected event has occurred. R prefers “signalling a condition” to “raising an exception”. Python, on the other hand, encourages raising and catching exceptions, and in some situations, requires it.
Simple Mail Transfer Protocol A standard internet communication protocol for transmitting email.
Simple Mail Transfer Protocol Secure A method for securing SMTP using TLS.
single square brackets One set of square brackets [ ], used to select a structure from another structure based on an index value, or range of values, inside the square brackets.
single-threaded A model of program execution in which only one thing can happen at a time. Single-threaded execution is easier for people to understand, but less efficient than multi-threaded execution.
singleton A set with only one element, or a class with only one instance.
Singleton pattern A design pattern that creates a singleton object to manage some resource or service, such as a database or cache. In object-oriented programming, the pattern is usually implemented by hiding the constructor of the class in some way so that it can only be called once.
slug An abbreviated portion of a page’s URL that uniquely identifies it. In the example https://www.mysite.com/category/post-name, the slug is post-name.
snake case See pothole case.
software distribution A set of programs that are built, tested, and distributed as a collection so that they can run together.
source code Source code or, simply, code, is the origin of executed code (either by means of an interpreter or compiler). It’s the primarily human-produced series of commands that make up a program. (Note: Automatic code generators exist for some applications)
source distribution A software distribution that includes the source code, typically so that programs can be recompiled on the target computer when they are installed.
specificity Statistical measure of a classification model which gives the True Negative rate. For example, the proportion of people who do not have a disease that test negative. Calculated as Specificity = TN/(TN+FP).
spectral analysis From a finite record of a stationary data sequence, estimate how the total power is distributed over frequency. See also “spectrum analysis problem”.
sprint A short, intense period of work on a project.
Square root A special case of the n-th root for which n = 2, i.e. the 2-nd root has the special name “square root”.
SSH key A string of random bits stored in a file that is used to identify a user for SSH. Each SSH key has separate public and private parts; the public part can safely be shared, but if the private part becomes known, the key is compromised.
stack frame A section of the call stack that records details of a single call to a specific function.
stale (in build) To be out-of-date compared to a prerequisite. A build manager’s job is to find and update things that are stale.
standard error A predefined communication channel for a process, typically used for error messages.
standard input A predefined communication channel for a process, typically used to read input from the keyboard or from the previous process in a pipe.
standard output A predefined communication channel for a process, typically used to send output to the screen or to the next process in a pipe.
stratified sampling Selecting values by dividing the overall population into homogeneous groups and then taking a random sample from each group.
stream A sequential flow of data, such as the bits arriving across a network connection or the bytes read from a file.
string interpolation The process of inserting text corresponding to specified values into a string, usually to make output human-readable.
student's t-distribution See t-distribution.
subcommand A command that is part of a larger family of commands. For example, git commit is a subcommand of Git.
subdirectory A directory that is below another directory.
support vector machine A supervised learning algorithm that seeks to divide points in a dataset so that the empty space between the resultant sets is as wide as possible.
synchronous To happen at the same time. In programming, synchronous operations are ones that have to run simultaneously, or complete at the same time.
systematic error See bias.
t-distribution A variation on the normal distribution that is adjusted to account for estimating variance from the sample instead of knowing it in advance.
tab completion A technique implemented by most REPLs, shells, and programming editors that completes a command, variable name, filename, or other text when the TAB key is pressed.
table A set of records in a relational database or observations in a data frame. Tables are usually displayed as rows (each of which represents one record or observation and columns (each of which represents a field or variable.
tag (in version control) A readable label attached to a specific commit so that it can easily be referenced later.
Template Method pattern A design pattern in which a parent class defines an overall sequence of operations by calling abstract methods that child classes must then implement. Each child class then behaves in the same general way, but implements the steps differently.
ternary expression An expression that has three parts. Conditional expressions are the only ternary expressions in most languages.
tessellation shader The shader stage in the rendering pipeline designated towards subdividing primitives to increase the resolution of a mesh without impacting memory. Not to be confused with geometry shaders which change the overall shape.
test data Test data is a portion of a dataset used to evaluate the correctness of a machine learning algorithm after it has been trained. It should always be separated from the training data to ensure that the model is properly tested with unseen data.
test runner A program that finds and runs software tests and reports their results.
test-driven development A programming practice in which tests are written before a new feature is added or a bug is fixed in order to clarify the goal.
three Vs The volume, velocity, and variety that distinguish big data.
throw (exception) Another term for raising an exception.
ticket See issue.
ticketing system See issue tracking system.
Tidymodels A collection of R packages for modeling and statistical analysis designed with a shared philosophy.
time series A set of measurements taken at different times, which may or may not be regular intervals.
timestamp A digital identifier showing the time at which something was created or accessed. Timestamps should use ISO date format for portability.
tolerance How closely the actual result of a test must agree with the expected result in order for the test to pass. Tolerances are usually expressed in terms of relative error.
training data Training data is a portion of a dataset used to train machine learning algorithm to recognise similar data. It should always be separated from the test data to ensure that the model is properly tested with data it has never seen before.
transitive dependency If A depends on B and B depends on C, C is a transitive dependency of A.
Transport Layer Security A cryptographic protocol for securing communications over a computer network.
tree A graph in which every node except the root has exactly one parent.
triage To go through the issues associated with a project and decide which are currently priorities. Triage is one of the key responsibilities of a project manager.
true The logical (Boolean) state opposite of “false”. Used in logic and programming to represent a binary state of something.
truthy Evaluating to true in a Boolean context.
tuple A data type that has a fixed number of parts, such as the three color components of a red-green-blue color specification. In “Python”, tuples are immutable (their values cannot be reset.)
two hard problems in computer science Refers to a quote by Phil Karlton—”There are only two hard problems in computer science—cache invalidation and naming things.” Many variations add a third problem (most often “off-by-one errors”).
type coercion To convert data from one type to another, e.g., from the integer 4 to the equivalent floating point number 4.0.
Unicode A standard that defines numeric codes for many thousands of characters and symbols. Unicode does not define how those numbers are stored; that is done by standards like UTF-8.
Uniform Resource Locator A unique address on the World-Wide Web. URLs originally identified web pages, but may also represent datasets or database queries, particularly if they include a query string.
unit test A test that exercises one function or feature of a piece of software and produces pass, fail, or error.
up-vote A vote in favor of something.
update operator See in-place operator.
upstream repository The remote repository from which this repository was derived. Programmers typically save changes in their own repository and then submit a pull request to the upstream repository where changes from other programmers are also collected.
UTF-8 A way to store the numeric codes representing Unicode characters in memory that is backward-compatible with the older ASCII standard.
variable (data) Some attribute of a population that can be measured or observed.
variable arguments In a function, the ability to take any number of arguments. R uses ... to capture the “extra” arguments. Python uses *args and **kwargs to capture unnamed, and named, “extra” arguments, respectively.
vertex shader The shader stage in the rendering pipeline designated towards handling operations on individual vertices in a scene. A vertex shader can be used to calculate properties of a single vertex, such as position and per-vertex lighting. Not to be confused with fragment shaders which are used to determine the actual colour being rendered to each pixel of the screen.
Visitor pattern A design pattern in which the operation to be done is taken to each element of a data structure in turn. It is usually implemented by having a generator “visitor” that knows how to reach the structure’s elements, which is given a function or method to call for each in turn, and that carries out the specific operation.
walk (a tree) To visit each node in a tree in some order, typically depth-first or breadth-first.
whitespace The space, newline, carriage return, and horizontal and vertical tab characters that take up space but do not create a visible mark. The name comes from their appearance on a printed page in the era of typewriters.
wildcard A character expression that can match text, such as the * in *.csv (which matches any filename whose name ends with .csv).
workflow A way of describing work to be done as a set of tasks, typically with dependencies on external inputs or the outputs of other tasks, which can later be executed by a program. An example is a Makefile which can be executed by the make Unix command.

+

+1

Un voto a favor de alguna cosa.
Afrikaans, Deutsch, English, Français

A

abandonware

Software cuyo mantenimiento ha sido abandonado.
Afrikaans, አማርኛ, اَلْعَرَبِيَّةُ, Bangla, Deutsch, English, Français, Bahasa Indonesia, Italiano, Português, Setswana

agregación

Combinar muchos valores en uno, por ejemplo, sumando una serie de números o concatenando un conjunto de cadenas de caracteres.
Afrikaans, اَلْعَرَبِيَّةُ, Deutsch, English, Français, Português, Kiswahili, Setswana

algoritmo

Un algoritmo es un conjunto de pasos, instrucciones o reglas que se han de seguir para llevar a cabo una tarea específica. En informática, un algoritmo es un conjunto de instrucciones en un programa informático que resuelve un problema computacional.
Afrikaans, اَلْعَرَبِيَّةُ, Deutsch, Ελληνικά, English, Français, Bahasa Indonesia, Italiano, Português, Kiswahili, Українська, IsiZulu

aliasing

Solapamiento. Tener dos o más referencias al mismo objeto, por ejemplo como estructura de datos en memoria o como archivo guardado en disco.
Afrikaans, Deutsch, English, Français, Kiswahili, Setswana

ancla

En una expresión regular, el símbolo que fija una posición sin coincidir con caracteres. ^ coincide con un inicio de línea, $ con el final y \b con un límite de palabra.
Afrikaans, English, Français, Português

anti join

Una unión que conserva las filas de la tabla A cuyas claves no coinciden con las de la tabla B.
anti join, cross_join, full_join, inner_join, left_join, right_join, self_join
Afrikaans, አማርኛ, English, Français

aprendizaje no supervisado

Un tipo de algoritmos que aprenden patrones a partir de datos sin anotar/etiquetar.
aprendizaje supervisado, reinforcement_learning
English

Aprendizaje profundo

Una familia de algoritmos de redes neuronales que utilizan múltiples capas para extraer atributos de los datos a niveles de abstracción sucesivamente más altos.
አማርኛ, Deutsch, English, Kiswahili

aprendizaje supervisado

Tipo de algoritmos en los que el sistema aprende patrones a partir de datos de entrenamiento anotados/etiquetados.
aprendizaje no supervisado, reinforcement_learning
English

argumento

El término no debe ser confundido con, ni es sinónimo de, parámetro. Un argumento es una de posiblemente varias expresiones que son pasadas a una función. Es el valor real que se pasa. Parámetros y argumentos son conceptos distintos pero relacionados. Los parámetros son variables y los argumentos son los valores asignados a esas variables.
Afrikaans, አማርኛ, اَلْعَرَبِيَّةُ, Deutsch, English, Français, Português, Kiswahili, Українська

ASCII

Manera estándar de representar los caracteres comúnmente usados en lenguajes de Europa Occidental como enteros de 7- u 8-bits, ahora reemplazado por Unicode.
Afrikaans, አማርኛ, اَلْعَرَبِيَّةُ, Deutsch, English, Français, Kiswahili, Українська

asincrónico

No sucediendo al mismo tiempo. En programación, una operación asincrónica es una que corre independientemente de otra, o que comienza en un tiempo y termina en otro.
synchronous
Afrikaans, አማርኛ, اَلْعَرَبِيَّةُ, Deutsch, Ελληνικά, English, Français, Kiswahili, Українська

atributo

Una pareja nombre-valor asociada a un objeto, usada para almacenar metadatos sobre el objeto como, por ejemplo, las dimensiones de un arreglo.
Afrikaans, አማርኛ, Deutsch, English, Français, Bahasa Indonesia, Kiswahili, Setswana

auto-completar

Funcionalidad que le permite a una persona usuaria terminar una palabra o código rapidamente pulsando la tecla TAB para que aparezca una lista de posibles palabras o códigos entre los que el usuario puede seleccionar la función que necesite.
Afrikaans, አማርኛ, Deutsch, English, Français, Português, Kiswahili

autocorrelación

El grado de similitud entre observaciones en la misma serie temporal (también conocida como serie de tiempo), pero separadas por un intervalo de tiempo (conocido como el “rezago”). El análisis de autocorrelación se puede usar para conocer más información sobre conjuntos de datos que son series temporales al detectar patrones repetitivos que pueden estar parcialmente ocultos por el ruido aleatorio, entre otros usos.
Afrikaans, Deutsch, English, Français, Kiswahili

B

base de datos relacional

Una base de datos que organiza la información en tablas, cada una de las cuales tiene un set fijo de campos con nombre (que se muestran como columnas) y un número variable de registros (que se muestran como filas).
SQL
English, Français, Kiswahili

biblioteca

un paquete de software reutilizable, también se llama un módulo.
አማርኛ, Deutsch, English, Français, Português

Big Data

Cualquier dato que hasta hace poco tiempo era muy grande para que la mayoria de las personas pudieran trabajar con ellos en una sola computadora.
three_vs
Afrikaans, አማርኛ, اَلْعَرَبِيَّةُ, Deutsch, English, Français, Kiswahili, Setswana, Українська, IsiZulu

bit

Unidad de información representando alternativas sí/no, verdadero/falso. En computación, un estado de ser 0 ó 1.
sistema binario, Booleano
Afrikaans, አማርኛ, Deutsch, English, Français, Bahasa Indonesia, Português, Nederlands, Kiswahili, Setswana, Українська

Booleano

Relacionado a una variable o tipo de dato que puede tomar un valor lógico. Un valor lógico puede ser verdadero o falso. El termino “booleano” viene en honor a George Boole, un matematico del siglo XIX. El concepto del computador esta fundamentado en el sistema binario, en el qual se evalua entre estados de verdedero o falso.
truthy, falsy, sistema binario
አማርኛ, اَلْعَرَبِيَّةُ, Deutsch, English, Bahasa Indonesia, Português, Kiswahili, Українська

bug

Una carasterística faltante o indeseada de un software.
Afrikaans, አማርኛ, Deutsch, English, Setswana, Українська

byte

Unidad de información digital que típicamente consiste de ocho dígitos binarios, o bits.
Deutsch, English, Français, Setswana, Українська

C

cache

Algo que guarda copias de datos para que futuras consultas puedan ser respondidas más rápido. El CPU de una computadora usa un cache de hardware para guardar valores recientemente accedidos; muchos programas dependen de un cache de software para disminuir el tráfico y latencia de red. Averiguar cuando algo en cache se ha vuelto anticuado y debe ser reemplazado es uno de los dos problemas difíciles de la ciencia de computación.
አማርኛ, Deutsch, English, Українська

caching

Guardar una copia de algún dato en el caché local para hacer su acceso futuro más rápido.
English, Українська

capa oculta (deep learning)

Una capa oculta en una red neuronal hace referencia a las capas de neuronas que no están directamente conectadas al ingreso o salida de información. Las capas están “escondidas” porque no puedes observar directamente el ingreso y salida de valores.
red neuronal, machine_learning, Aprendizaje profundo, perceptrón
English

carpeta

Otro término para hacer referencia a un directorio.
አማርኛ, Deutsch, English, Kiswahili, Українська

CC-0

Una licencia de Creative Commons que no impone ninguna restricción, por lo que pone a la obra en dominio público.
አማርኛ, Deutsch, English, Português, Українська

ciclo

También conocido como bucle. Estructura que ejecuta un bloque de código repetidamente hasta que se cumple una condición de salida o fin.
ciclo for, ciclo while, infinite_loop, loop_body
English, Українська

ciclo for

También conocido como bucle for. Estructura dentro de un programa que repite una o más instrucciones (el cuerpo del ciclo o bucle) una vez por cada elemento de una secuencia, por ejemplo, cada número dentro de un rango o cada elemento de una lista.
ciclo while
አማርኛ, English

ciclo while

También conocido como bucle while. Estructura dentro de un programa que repite una o más instrucciones (el cuerpo del ciclo o bucle) mientras una condición sea verdadera.
ciclo for
English

ciencia de datos

La combinación de estadísticas, programación y trabajo duro que se utiliza para extraer información de los datos.
Afrikaans, አማርኛ, Ελληνικά, English, Português, Kiswahili, Українська

cientifico/cientifica de datos

Alguien que usa habilidades de programación para resolver problemas de estadísticas.
Afrikaans, አማርኛ, Ελληνικά, English, Português, Kiswahili, Українська

clase

En programación orientada a objetos, es una estructura que combina datos y operaciones (denominadas métodos). El programa emplea un constructor para crear un objeto con esas propiedades y métodos. Los programadores generalmente definen comportamientos genéricos o reutilizables en superclases y comportamientos más específicos o detallados en subclases.
አማርኛ, English, Українська

clasificación

El proceso de identificar a que categoría predefinida pertenece un objeto, como, por ejemplo, decidir si un mensaje de correo electrónico es spam o no. Muchos algoritmos de machine learning realizan clasificación.
aprendizaje supervisado, clustering
አማርኛ, English

codificación de caracteres

Especificación sobre cómo los caracteres están guardados como bytes. En la actualidad, la codificación más utilizada es UTF-8.
አማርኛ, اَلْعَرَبِيَّةُ, English

computación de alto rendimiento

Método que utiliza procesadores poderosos, usualmente trabajando en paralelo, para analizar datos. Su uso adecuado reduce el tiempo de análisis en comparación con una computadora personal y permite la exploración de grandes colecciones de datos.
English

concatenar

Añadir/apilar ya sea columnas (eje=1) o filas (eje=0) mediante la unión de datos de extremo a extremo. Python.
agregación, left_join, right_join
Afrikaans, English

constante

Una constante en programación es un nombre asociado con un valor que nunca cambia durante la ejecución de un programa. Solo se puede acceder al valor de la constante, pero no cambiarlo con el tiempo. En oposición a una variable. # Un valor que no puede ser cambiado después de haber sido definido, opuesto a lo que es una variable.
اَلْعَرَبِيَّةُ, English, Français, Kiswahili

constructor

Una función que crea un objeto de una clase particular. En el sistema de objetos de S3, los constructores son más una convención que un requisito.
አማርኛ, English

correlación

Una medida de cuán bien dos variables están de acuerdo una con la otra. La correlación usualmente se mide calculando un coeficiente de correlación, y no implica causalidad.
Afrikaans, አማርኛ, اَلْعَرَبِيَّةُ, English, Français, Português, Українська

Curva ROC

Una curva ROC (acrónimo de Receiver Operating Characteristic, o Característica Operativa del Receptor) es un gráfico que muestra el desempeño de un clasificador binario con diferentes umbrales de clasificación La curva se obtiene graficando la tasa de verdaderos positivos (también conocida como Recall o Sensibilidad) a lo largo del eje vertical y la taza de falsos positivos a lo largo del eje horizontal.
machine_learning, clasificación
English

D

datos ordenados

Datos tabulares que satisfacen tres condiciones que facilitan su limpieza inicial y su posterior exploración y análisis—(1) cada variable conforma una columna, (2) cada observación conforma una fila y (3) cada tipo de unidad de observación conforma una tabla.
table
English

decrementar

Una operación unaria que disminuye el valor de una variable, generalmente en 1.
incrementar
English, Português, Kiswahili

desarrollo ágil

Una metodología de desarrollo de software que enfatiza muchos pasos pequeños y feedback continuo en vez de planificación por adelantado y a largo plazo. Programación exploratoria suele ser ágil.
Afrikaans, Deutsch, English, Français, Português, Kiswahili, Setswana, Українська

descenso del gradiente

Un algoritmo de optimización que calcula repetidamente el gradiente en el punto actual, da un pequeño paso en la dirección en la que el gradiente está disminuyendo, y luego recalcula el gradiente.
retropropagación
አማርኛ, English

desviación estándar

En qué medida los valores de un conjunto de datos difieren de la media. Se calcula como la raíz cuadrada de la varianza.
regla 68-95-99.7
Deutsch, English, Italiano, Português, Українська

diccionario

Una estructura de datos que contiene parejas llave-valor, a veces llamados arreglos asociativos. Los diccionarios en ocasiones son implementados usando tablas hash.
አማርኛ, English, Українська

directorio

Un objeto dentro de un sistema de archivos que contiene archivos y otros directorios. También conocido como un folder.
English, Kiswahili, Українська

distribución normal estándar

Una distribución normal con una media de 0 y una desviación estándar de 1. Valores de distribuciones normales con otros parámetros se puede reescalar fácilmente para obtener una distribución normal estándar.
English, Português

E

Entorno de Desarrollo Integrado

Una aplicación que ayuda a programadores a desarrollar software. Los EDI usualmente tiene un editor incorporado, una consola que ejecuta el código inmediatamente y navegadores para explorar estructuras de datos en la memoria y archivos en el disco.
repl
አማርኛ, English

entorno virtual

En Python, el paquete virtualenv permite crear entornos virtuales de Python para contener paquetes y versiones de esos paquetes que uno quiere usar para un proyecto o tarea particular sin afectar otros entornos virtuales o el entorno por defecto del sistema.
English

epoch (aprendizaje profundo)

En aprendizaje profundo, un epoch es un ciclo completo en el proceso del aprendizaje profundo en el que todos los datos de entrenamiento se han introducido una vez en el algoritmo. El entrenamiento de una red neuronal profunda consiste en varios epoch.
Aprendizaje profundo, retropropagación, perceptrón, red neuronal, machine_learning
English

error absoluto

El valor absoluto de la diferencia entre el valor observado y el valor correcto. El error absoluto suele ser menos útil que el error relativo.
Afrikaans, አማርኛ, اَلْعَرَبِيَّةُ, Deutsch, English, Français, Bahasa Indonesia, Italiano, Português, Kiswahili

error absoluto medio

El error promedio de todos los valores predichos en comparación con los valores reales.
error cuadrático medio, raíz del error cuadrático medio
አማርኛ, English

error cuadrático medio

El promedio de los cuadrados de todos los errores de todos los valores predichos comparados con valores reales. El cuadrado hace que los errores más grandes cuenten más, haciendo esta medida más popular que el error absoluto medio.
raíz del error cuadrático medio
አማርኛ, English

error relativo

El valor absoluto de la diferencia entre el valor observado y el valor correcto, dividido por el valor deseado. Por ejemplo, si el valor observado es 9 y el valor correcto es 10, el error relativo es 0.1. El error relativo suele ser más útil que el error absoluto.
English, Français, Italiano, Português

escalar

Un único valor de un tipo particular, como 1 o “a”. Los escalares realmente no existen en R; los valores que parecen ser escalares son en realidad vectores de largo uno.
English

expresión regular

Un patrón para buscar coincidencias en texto, que están a su vez escritas como texto. Las expresiones regulares a veces son llamadas “regexp”, “regex”, o “RE”, y son poderosas.
English

expresión unaria

Una expresión con un único argumento, como log 5.
binary_expression, nullary_expression, ternary_expression
English

F

fallar (una prueba)

Una prueba falla si el resultado real no coincide con el resultado esperado.
pasar (una prueba)
አማርኛ, English, Kiswahili, Setswana

FASTA

Formato de archivo para almacenar información de secuencias genómicas o de aminoácidos. La información de cada secuencia se divide en 2 líneas. La línea 1 contiene información sobre la secuencia y comienza con el símbolo ‘>’ (mayor que). La línea 2 contiene la secuencia genómica o de aminoácidos utilizando códigos de una sola letra.
English, Kiswahili

función anónima

Es una función que no tiene asignado un nombre. Las funciones anónimas son usualmente cortas, y se definen en el mismo lugar donde son utilizadas, por ejemplo: en callbacks. En Python, estas funciones se conocen como funciones lambda y son creadas con el uso de la palabra reservada lambda.
Afrikaans, አማርኛ, اَلْعَرَبِيَّةُ, Deutsch, English, Français

función de agregación

Una función que combina varios valores en uno, como sum o max.
Afrikaans, اَلْعَرَبِيَّةُ, Deutsch, English, Français, Bahasa Indonesia, Português, Setswana

función genérica

Un conjunto de funciones con propósitos similares, cada una operando en una clase diferente de datos.
አማርኛ, English, Français, Português

G

Literalmente: “migas,” “migas de pan,” “migaja;” en esto contexto: “árbol de navegación,” “guías de navegación,” “navegación de migas,“ o “rastro de navegación.” Un grupo de enlaces de navegación incluidos en muchos sitios web, generalmente ubicados en la parte superior de una página. “Breadcrumbs” muestran a los usuarios dónde se encuentra la página en el sitio web; la palabra proviene de un cuento de hadas en que unos niños habían dejado atrás un rastro de migas de pan para que pudieran encontrar su camino a casa.
Deutsch, English, Українська

I

imagen rasterizada

Una imagen almacenada como una matriz de píxeles.
English

importar

traer cosas de un módulo para incorporarlas al programa. En la mayoría de las lenguajes, un programa solo puede importar cosas que el módulo exporta explícitamente.
አማርኛ, English

incrementar

Una operación unaria que aumenta el valor de una variable, generalmente en 1.
decrementar
English, Português

instalación global

El acto de instalar un paquete en una ubicación donde pueda ser accedido por todas las usuarias y proyectos.
instalación local
አማርኛ, English, Français

instalación local

El acto de ubicar un paquete en un proyecto en particular para que sólo sea accesible dentro de ese proyecto.
instalación global
አማርኛ, English, Français

Interfaz de Programación de Aplicaciones

Un conjunto de funciones y procedimientos proporcionados por una libreria de software o servicio web atraves del cual otra aplicación se puede comunicar. Una API no es el código, la base de datos o el servidor; es el punto de acceso.
Afrikaans, አማርኛ, اَلْعَرَبِيَّةُ, Deutsch, English, Français, Kiswahili, Setswana, Українська

interfaz de usuario

Plataforma para la interacción entre un usuario y una máquina. Una interacción puede ocurrir mediante texto (una interfaz de línea de comandos), gráficamente y con ventanas (una interfaz gráfica de usuario), u otros métodos tal como interfaces manejadas por voz.
cli, gui
English

L

licencia abierta

Una licencia que permite reuso en general, tal como la Licencia MIT o GPL para software y CC-BY o CC-0 para datos, prosas u otros productos creativos.
English, Українська

lingüística computacional

El estudio o aplicación de métodos computacionales para procesamiento o entendimiento de lenguajes humanos. Los primeros acercamientos fueron algorítmicos; la mayoría de los acercamientos modernos son estadísticos.
procesamiento de lenguaje natural
አማርኛ, English, Kiswahili

lsof

Comando en UNIX para ver la lista de archivos abiertos siendo utilizados por procesos.
English

M

máquina virtual

Un programa que pretende ser una computadora. Aunque puede parecer redundante, las máquinas virtuales (MV) se crean y se inician rápidamente, y los cambios hechos dentro de la máquina virtual quedan contenidos dentro de esa VM, esto permite que podamos instalar nuevos paquetes o ejecutar un sistema operativo diferente sin afectar la computadora subyacente.
English

marco de datos

Una estructura de datos bi-dimensional para guardar datos tabulares en memoria. Líneas representan entradas y columnas representan variables.
datos ordenados
English, Français, Kiswahili

media aritmética

Ver media. Calculado por un grupo de n números realizando la suma de esos números y dividiendo el resultado entre n.
Afrikaans, አማርኛ, English, Français, Italiano, Português, Kiswahili, Українська

mediana

Un valor que separa las mitades superior e inferior de un conjunto de datos ordenado. Frecuentemente, la mediana da una idea mejor de lo que es característico del conjunto de datos que la media, que puede estar influenciada por un pequeño número de valores atípicos. Si el conjunto de datos contiene un número par de elementos, este es el promedio de los dos elementos centrales.
moda
Afrikaans, አማርኛ, English

método

Una implementación de una función genérica que manipula objetos de una clase específica.
አማርኛ, English

método abstracto

En programación orientada a objectos, es un método definido pero no implementado. Programadores definen un método abstracto en una superclase para especificar operaciones que las subclases deberán proveer.
Afrikaans, አማርኛ, اَلْعَرَبِيَّةُ, Deutsch, Ελληνικά, English, Français, Bahasa Indonesia, Italiano, Português

moda

El valor que ocurre con más frecuencia en un conjunto de datos.
mean, mediana
Afrikaans, አማርኛ, Deutsch, English

módulo

un paquete de software reutilizable, también se llama una biblioteca.
Afrikaans, English, Français, isiXhosa

N

nodo

Un elemento de un grafo que está conectado a otros nodos por aristas. Los nodos usualmente tienen datos asociados con ellos, como nombres o pesos.
Afrikaans, English

número de fila absoluto

El índice secuencial que indentifica una fila en un tablero, sin importar qué secciones se estén mostrando.
Afrikaans, አማርኛ, اَلْعَرَبِيَّةُ, Deutsch, Ελληνικά, English, Français, עִברִית, Bahasa Indonesia, Italiano, Português

número pseudo-aleatorio

Un valor generado de forma repetible que refleja suficientemente bien a la verdadera aleatoriedad del universo como para engañar a simples observadores mortales.
Afrikaans, English

numpy

Es un paquete Python de código abierto que le permite trabajar con arreglos, vectores y matrices de dimensión N, en un método comparable y con una sintaxis similar al software Matlab. Puede encontrar funciones y operaciones sofisticadas, enfocadas en matrices multidimensionales, álgebra lineal, transformada de Fourrier y generación de valores aleatorios.
Python
English, Português

O

objeto

En programación orientada a objetos, es una estructura que contiene los datos de una instancia específica de una clase. Las operaciones que son capaces de realizar estos objetos están definidas por los métodos de la clase.
English

P

pandas

Es un paquete de Python de código abierto que ofrece estructuras de datos rápidas, flexibles y expresivas para que trabajar con series de tiempo estructuradas sea fácil e intuitivo. Se utiliza como una poderosa herramienta para el análisis y la manipulación de datos.
Python
English, Português, Українська

pasar (una prueba)

Una prueba pasa si el resultado real coincide con el resultado esperado.
fallar (una prueba)
English

perceptrón

El tipo más simple de red neuronal, que aproxima una sola neurona con N entradas binarias al calcular una suma ponderada de sus entradas, y se activa si dicho valor es mayor o igual a cero.
English
Abreviatura de “enlace permanente”, una URL que se pretende que dure para siempre.
English

precisión

Medida estadística de un modelo de clasificación que da la proporción de predicciones correctas entre el número total de casos. Se calcula como Precisión = (TP+TN)/(TP+TN+FP+FN)
Afrikaans, Deutsch, English, Français, Kiswahili, Українська

procesamiento de lenguaje natural

Ver lingüística computacional.
English

programación exploratoria

Un método de desarrollo de software en el cual los requerimientos emergen o cambian a medida que el software es escrito, frecuentemente en respuesta a resultados de corridas previas.
Afrikaans, አማርኛ, English, Kiswahili

programación literaria

Un paradigma de programación que combina código y documentación en un solo archivo.
computational_notebook, R Markdown
አማርኛ, Deutsch, English, Français

programación orientada a objetos

Un paradigma de programación en el cual los datos (atributos) y funciones (métodos) se agrupan en objetos que interactúan entre sí a través de interfaces bien definidas.
English, Português, Українська

Python

Un lenguaje de programación de código abierto interpretado popular que se basa en la sangría para definir la estructura de control.
English, Français, Italiano, Português

R

R base

Funciones básicas que conforman el lenguaje de programación R. Los paquetes base pueden encontrarse en src/library y no son actualizados fuera de R; su número de versión coincide con el de R. Los paquetes de R base se instalan y cargan junto con R, mientras que los paquetes prioritarios se instalan con R, pero deben ser cargados antes de utilizarse.
Tidyverse
Afrikaans, አማርኛ, اَلْعَرَبِيَّةُ, English, Français, Português

R Markdown

Un dialecto de Markdown que permite a sus autores mezclar prosa y código (usualmente escrito en R) en un mismo documento.
computational_notebook, programación literaria
English, Français

raíz del error cuadrático medio

La raíz cuadrada del error cuadrático medio. Como la desviación estándar, está en las mismas unidades que los datos originales.
error absoluto medio
English

reciclar

Reutilizar valores de un vector más corto con el fin de generar una secuencia del mismo largo que el vector más largo.
English

Red Bayesiana

Un grafo que representa la relacion entre las variables aleatorias para un determinado problema.
Teorema de Bayes, markov_chain, naive_bayes_classifier
Afrikaans, አማርኛ, English

red neuronal

Uno de los algoritmo de una gran familia de algoritmos usados para identificar patrones en los datos imitando la forma en que interactúan las neuronas del cerebro. Una red neuronal consta de una o más capas de nodos, cada uno de los cuales está conectado a nodos en la capa anterior y en la capa siguiente. Si suficientes entradas de un nodo están activas, dicho nodo también se activa.
Aprendizaje profundo, retropropagación, perceptrón
English

regla 68-95-99.7

Expresa el hecho de que el 68% de los valores se encuentran dentro de una desviación estándar de la media, el 95% está dentro de dos y el 99,7% está dentro de tres. Por el contrario, aproximadamente el 0,3% de los valores se encuentran más de tres desviaciones estándar por encima o por debajo de la media en la mayoría de los casos.
Afrikaans, አማርኛ, اَلْعَرَبِيَّةُ, Deutsch, English, Français, Bahasa Indonesia, Italiano, Português

repositorio

Lugar en el que un sistema de control de versión guarda los archivos que conforman un proyecto y los metadatos que describen su historia.
git, github
English, 日本語, Português, Українська

resultado esperado (de una prueba)

El valor que se supone debe producir una sección de código cuando se prueba de una manera determinada, o el estado en el que se supone debe dejar el sistema.
resultado real (de una prueba)
አማርኛ, English, Kiswahili

resultado real (de una prueba)

El valor generado ejecutando el código en una prueba. Si coincide con el [resultado esperado] (#expected_result), la prueba [pasa] (#pass_test); si son diferentes, la prueba [falla] (#fail_test).
Afrikaans, اَلْعَرَبِيَّةُ, Deutsch, English, Français, Bahasa Indonesia, Kiswahili, Setswana

retrocompatible

Una propiedad de un sistema, hardware o software, que permite la interoperabilidad con un sistema heredado más antiguo, o con la entrada diseñada para un sistema así. Por ejemplo, una función escrita en Python 3 que puede ser ejecutada exitosamente en Python versión 2 es retrocompatible.
Afrikaans, አማርኛ, Deutsch, English, Français, Português, Українська

retropropagación

Un algoritmo que ajusta iterativamente los pesos utilizados en una red neuronal. El algoritmo de retropropagación se usa a menudo para implementar el algoritmo llamado “descenso del gradiente”.
Afrikaans, አማርኛ, Deutsch, English, Français

revisión de código

Revisar un programa o un cambio a un programa inspeccionando su código fuente.
አማርኛ, English, Українська

ruta absoluta

Una ruta que dirige a la misma ubicación en el sistema de archivos independientemente del contexto donde sea evaluada. Una ruta absoluta es el equivalente a la latitud y longitud en geografía.
relative_path
Afrikaans, አማርኛ, اَلْعَرَبِيَّةُ, Deutsch, Ελληνικά, English, Français, Bahasa Indonesia, Italiano, Português, Українська

S

S

Un lenguaje para análisis de datos, modelado estadístico y gráficos desarrollado originalmente en los Laboratorios Bell. R es un dialecto de S.
English

S3

Un entorno para la programación orientada a objetos en R.
English

secuencia de caracteres

Bloque de texto en un programa.
English, Українська

Secure Shell

Un protocolo y el programa que lo implementa cuya principal función es el acceso remoto a un servidor por medio de un canal seguro en el que toda la información está cifrada.
English, Українська

semilla

Un valor utilizado para inicializar un generador de números pseudoaleatorios.
Afrikaans, English

sesgo

Un estadístico está sesgado si es sistemática o consistentemente diferente del parámetro que se supone que debe estimar.
varianza, overfitting, clasificación, systematic_error
Afrikaans, አማርኛ, Deutsch, English, Français, Português, Kiswahili, Setswana, isiXhosa

sesgo algorítmico

Resultados repetibles que muestran un sesgo, o un tratamiento sesgado, encontrado en un algoritmo. Por ejemplo, algoritmos de redes sociales pueden priorizar o estigmatizar contenido de ciertos grupos de usuarios.
algoritmo
Afrikaans, አማርኛ, اَلْعَرَبِيَّةُ, English, Kiswahili

sesgo de selección en la variable dependiente

Un estudio que solamente incluye casos en los que la variable dependiente tiene el mismo valor, en lugar de casos con variación en la variable dependiente, es un estudio con sesgo de selección en la variable dependiente.
English

sistema binario

Un sistema donde puede haber dos posibilidades de estado. En la computación, el sistema binario se representa con el estado de 0 ó 1. En el sistema lógico Booleano, falso se representa con (0) y verdadero con (1). Las computadoras operan en sistemas binarios donde almacenan bits.
Afrikaans, አማርኛ, Deutsch, English, Français, Português, Українська

sistema de archivos

La parte del sistema operativo que administra cómo se almacenan y recuperan los archivos. También se usa para referirse a todos esos archivos y directorios o a la forma específica en que se almacenan (como en “el sistema de archivos Unix”).
አማርኛ, Deutsch, English, 日本語, Українська

sistema de control de versión

Un sistema para manejar los cambios hechos durante el desarollo de software.
git
English, Français, Português, Українська

SQL

Lenguaje utilizado para escribir consultas para una base de datos relacional. El término está un acrónimo del inglés Structured Query Language (Lenguage Estructurado de Consulta).
English, Français

Stack Overflow

Un sitio de preguntas y respuestas popular entre personas programadoras.
English, Português

subclase

En programación orientada a objetos, es la extensión de otra clase (denominada superclase).
አማርኛ, English, Português, Українська

superclase

En programación orientada a objetos, es la clase a partir de la cual se derivan otras clases (denominadas subclases).
English, Português, Українська

T

tasa de aprendizaje (aprendizaje profundo)

En redes neuronales artificiales, la tasa de aprendizaje es un hiperparámetro que determina el ritmo al que la red ajusta los pesos para poder dar cada vez una mejor aproximación. Una tasa de aprendizaje grande puede acelerar el entrenamiento, pero la red se puede sobrepasar y perder el mínimo global. Una tasa de aprendizaje pequeña se sobrepasará menos, pero será más lenta. También puede caer más fácilmente en mínimos locales.
Aprendizaje profundo, retropropagación, perceptrón, red neuronal, machine_learning
English

Teorema de Bayes

Una ecuación para calcular la probabilidad de que algo sea verdadero si algo     relacionado con ello es verdadero. Si P(X) es la probabilidad de que X is verdadero y P(X|Y) es     la probabilidad de que X es verdadero dado que Y sea verdadero, entonces P(X|Y) = P(Y|X) * P(X) / P(Y).
Red Bayesiana, naive_bayes_classifier, prior_distribution
Afrikaans, English

tibble

Un remplazo moderno para los data frames de R que guarda datos tabulares en columnas y filas, definido y usado en el tidyverse.
English, Français

Tidyverse

Una colección de paquetes de R para operar de maneras consistentes con datos tabulares.
English, Français, Português

U

UNIX

UNIX es una familia de sistemas operativos desarrollada en 1969 en AT&T Bell Labs. Sus principales características son herramientas sencillas, funcionalidad bien definida y portabilidad.
operating_system
English

V

variable (programa)

Un nombre en un programa que tiene algunos datos asociados. El valor de una variable se puede cambiar después de su definición.
constante
اَلْعَرَبِيَّةُ, English, Français, 日本語, Українська

variable global

Una variable definida fuera de alguna función en particular, por lo que es visible para todas las funciones.
variable local
አማርኛ, English, Français, Hrvatski, 한국어

variable independiente

El factor que cambias o controlas intencionadamente para ver qué efecto tiene sobre la variable dependiente.
Deutsch, English

variable local

Una variable definida dentro de una función, por lo que solo es visible dentro de ella.
closure, variable global
አማርኛ, English, Français, 한국어

varianza

En qué medida los valores de un conjunto de datos difieren de la media. Se calcula como el promedio de las diferencias al cuadrado entre los valores y la media. La desviación estándar se usa a menudo en su lugar, ya que tiene las mismas unidades que los datos, mientras que la varianza se expresa en unidades al cuadrado.
Deutsch, English, Italiano, Kiswahili

vector

Una secuencia de valores, normalmente de tipo homogéneo. Los vectores son la estructura de datos fundamental en R; un escalar es solo un vector con exactamente un elemento.
English

vectorizar

Escribir código de modo que las operaciones se ejecuten en vectores completos, más que elemento por elemento dentro de un bucle.
English

verdadero negativo

Resultado en el que el valor real es negativo y es correctamente predicho como negativo.
verdadero positivo, false_negative, false_positive, machine_learning, clasificación
English

verdadero positivo

Resultado en el que el valor real es verdadero y es correctamente predicho como verdadero.
verdadero negativo, false_negative, false_positive, machine_learning, clasificación
English

Vim (editor)

El editor de texto por defecto en Unix. Vim es un poderoso editor de texto que permite al usuario ejecutar comandos de shell y usar expresiones regulares para editar archivos programáticamente.
English

viñeta

Una guía de formato extenso utilizada para proporcionar detalles de un paquete más allá del README.md o de la documentación de una función.
English

X

XML

Un conjunto de reglas para definir etiquetas similares a HTML y usarlas para darle formato a documentos (normalmente datos). XML alcanzó popularidad a principios de la década de 2000, pero su complejidad llevó a muchos programadores a adoptar JSON en su lugar.
Deutsch, English, Français, Português

Y

YAML

Acrónimo recursivo de “YAML Ain’t Markup Language” (YAML no es un lenguaje de marcación), es una manera de representar datos anidados usando identación en lugar de paréntesis    y comillas usadas en JSON. YAML es usado frequentemente en     archivos de configuración y para definir parámetros en varios estilos de documentos en Markdown.
Afrikaans, Deutsch, English, Français, Bahasa Indonesia, Português