This course presents an overview and understanding of the intractable and pressing ethical issues as well as their related policies in the information fields. Emerging technological developments in relation to public interests and individual well-being are highlighted throughout the course. Special emphasis is placed on case studies and outcomes as well as frameworks for ethical decision-making.
Degree Requirements – M.S. Data Science
The MS in Data Science provides students the training they need in data collection, exploration, manipulation and storage, analysis, and presentation in order to navigate data-rich workplace environments. The degree requires 30 total units and can typically be completed in 1.5 years for full-time students.
Plan of Study
You should work with your faculty advisor to develop a Master’s Plan of Study during your first few months in the program. The Plan of Study should be submitted to the Graduate College no later than your second semester in the program.
The Master’s Plan of Study identifies 1) courses you intend to transfer from other institutions; 2) courses already completed at the University of Arizona which you intend to apply toward the graduate degree, and 3) additional coursework to be completed to fulfill degree requirements. The Plan of Study must have the approval of the Director of Graduate Studies before it can be submitted to the Graduate College.
Core Courses
- 9 units total
- Effective Fall 2023, students will be required to take either INFO 520: Ethical Issues in Information OR INFO 502: Data Ethics
This course will introduce students to the concepts and techniques of data mining for knowledge discovery. It includes methods developed in the fields of statistics, large-scale data analytics, machine learning, pattern recognition, database technology and artificial intelligence for automatic or semi-automatic analysis of large quantities of data to extract previously unknown interesting patterns. Topics include understanding varieties of data, data preprocessing, classification, association and correlation rule analysis, cluster analysis, outlier detection, and data mining trends and research frontiers. We will use software packages for data mining, explaining the underlying algorithms and their use and limitations. The course include laboratory exercises, with data mining case studies using data from many different resources such as social networks, linguistics, geo-spatial applications, marketing and/or psychology.
This course provides an overview of the various concepts and skills required for effective data visualization. It presents principles of graphic design, programming skills, and statistical knowledge required to build compelling visualizations that communicate effectively to target audiences. Visualization skills addressed in this course include choosing appropriate colors, shapes, variable mappings, and interactivity based on principles of color perception, pre-attentive processing, and accessibility. |
Experiential Courses
Complete 3 units total. More information on experiential courses is available on our internships and individual studies pages.
For either course:
- Identify your internship supervisor (INFO 693) or iSchool faculty supervisor (INFO 698)
- Request an experience via Handshake as described on our internships and individuals studies pages
- The internship or capstone project must exercise all competencies required for the M.S. degree
- The internship or capstone project must have a software development component. Capstones must deposit code in GitHub or other source code repository
- Upon completing the internship or capstone project, submit a report (5000-6000 words in length) in the form of an academic paper, documenting what has been accomplished and explaining how the competencies have been demonstrated
- Your supervisor(s) will complete a competencies evaluation form, evaluate the project, and assign a pass/fail grade
You must submit your application in Handshake. More information can be found on the individual studies page.
Elective Courses
- 18 units total
- Students have the choice of completing the MS alone or using sets of courses in order to attain one or more graduate certificates at the same time. Please see corresponding units' web pages for more information about their graduate certificates (e.g., Linguistics). Visit our Graduate Certificates page for more information about the certificate in Foundations of Data Science.
- Any non-core courses with the INFO prefix or out-of-department courses are considered electives.
The objective of the course is to provide a sound understanding of fundamental statistical theory underlying econometric techniques utilized in quantitative analysis of problems in economics, business, and finance, public health and other social sciences. |
Econometric model-building, estimation, forecasting and simulation for problems in agricultural and resource economics. Applications with actual data and models emphasized. |
Emphasis in the course is on econometric model specification, estimation, inference, forecasting, and simulation. Applications with actual data and modeling techniques are emphasized. |
Most of the web data today consists of unstructured text. Of course, the fact that this data exists is irrelevant, unless it is made available such that users can quickly find information that is relevant for their needs. This course will cover the fundamental knowledge necessary to build these systems, such as web crawling, index construction and compression, Boolean, vector-based, and probabilistic retrieval models, text classification and clustering, link analysis algorithms such as PageRank, and computational advertising. The students will also complete one programming project, in which they will construct one complex application that combines multiple algorithms into a system that solves real-world problems. |
This course covers important algorithms useful for natural language processing (NLP), including distributional similarity algorithms such as word embeddings, recurrent and recursive neural networks (NN), probabilistic graphical models useful for sequence prediction, and parsing algorithms such as shift-reduce. This course will focus on the algorithms that underlie NLP, rather than the application of NLP to various problem domains. |
This course provides an introduction to technical aspects of cyber security. It describes threats and types of attacks against computers and networks to enable students to understand and analyze security requirements and define security policies. Security mechanisms and enforcement issues will be introduced. Students will be immersed in the cyber-security discipline through a combination of intense coursework, open-ended and real-world problems, and hands on experiments.
Machine learning deals with the automated classification, identification, and/or characterizations of an unknown system and its parameters. There are an overwhelming number of application driven fields that can benefit from machine learning techniques. This course will introduce you to machine learning and develop core principles that allow you to determine which algorithm to use, or design a novel approach to solving to engineering task at hand. This course will also use software technology to supplement the theory learned in the class with applications using real-world data.
Cloud Computing is an emerging paradigm that aims at delivering computing, information services, and data storage as a utility service over a network (e.g., Internet). There is a strong interest in cloud computing due to their performance and host, but their rapid deployment will exacerbate the security problem. In cloud computing, organizations relinquish direct control of many security aspects to the service providers such as trust, privacy preservation, identity management, data and software isolation, and service availability. The adoption and proliferation of cloud computing and services will be severely impacted if cloud security is not adequately addressed. The main goal of this course is discuss the limitations of current cybersecurity approaches to clouds and then focus on the fundamental issues to address the cloud security and privacy such as the confidentiality, integrity and availability of data and computations in clouds. In this course we will examine cloud computing models, look into the threat model and security issues related to data and computations outsourcing, and explore practical applications to make cloud resources secure and resilient to cyber attacks. |
Introduction to computer networks and protocols. Study of the ISO open systems interconnection model, with emphasis on the physical, data link, network, and transport layers. Discussion of IEEE 802, OSI, and Internet protocols. Graduate-level requirements include additional homework and assignments.
Provides an introduction to problems and techniques of artificial intelligence (AI). Automated problem solving, methods and techniques; search and game strategies, knowledge representation using predicate logic; structured representations of knowledge; automatic theorem proving, system entity structures, frames and scripts; robotic planning; expert systems; implementing AI systems. Graduate-level requirements include additional assignments. |
The goal of this course is to gain an introductory understanding of geographic programming and data automation techniques using ModelBuilder and the Python language.
The focus of this class is to examine and apply GIS open source programming. We will examine common languages used like Python, Java, html 5, as well as APIs, JSON, html, and SQL, to automate workflows, extend the tools, and create interactive web and mobile GS platforms. Topics include preparing data as strings, lists, tuples, and dictionaries prior to use, using Python to run SQL queries, working with roasters in Python, automating mapping tasks, and developing custom scripting tools. In addition to weekly assignments and readings, assessment will be oriented around a single, student-directed project that will take the second half of the semester to complete. It will require students to write a simple script to accomplish a specified task in ArcGIS and present the results of their work to peers.
Basic structure and function of the Immune System and its role in fighting infectious diseases and cancers and causing immunological diseases. |
This course will guide students through advanced applications of computational methods for social science research. Students will be encouraged to consider social problems from across sectors, including health science, environmental policy, education, and business. Particular attention will be given to the collection and analysis of data to study social networks, online communities, electronic commerce, and digital marketing. Students will consider the many research designs used in contemporary social research, including “Big” data, online surveys, and virtual experimental labs, and will think critically about claims of causality, mechanisms, and generalization.
Machine learning describes the development of algorithms, which can modify their internal parameters (i.e., "learn") to recognize patterns and make decisions based on example data. These examples can be provided by a human, or they can be gathered automatically as part of the learning algorithm itself. This course will introduce the fundamentals of machine learning, will describe how to implement several practical methods for pattern recognition, feature selection, clustering, and decision making for reward maximization, and will provide a foundation for the development of new machine learning algorithms.
Students will learn from experts from projects that have developed widely adopted foundational Cyber infrastructure resources, followed by hands-on laboratory exercises focused around those resources. Students will use these resources and gain practical experience from laboratory exercises for a final project using a data set and meeting requirements provided by domain scientists. Students will be provided access to computer resources at: UA campus clusters, iPlant Collaborative and at NSF XSEDE. Students will also learn to write a proposal for obtaining future allocation to large-scale national resources through XSEDE. Graduate-level requirements include reading a paper related to cyberinfrastructure, present it to the class, and lead a discussion on the paper.
Data Warehousing and Analytics In the Cloud will utilize concepts, frameworks, and best practices for designing a cloud-based data warehousing solution and explore how to use analytical tools to perform analysis on your data. In the first half of the course, I will provide an overview of the field of Cloud Computing, its main concepts, and students will get hands-on experience through projects which utilize cloud computing platforms. In the second half of the course, we will examine the construction of a cloud-based data warehouse system and explore how the Cloud opens up data analytics to huge volumes of data.
This course focuses on the use of modern data science methods to help learners make socially responsible decisions and mitigate harm that arises from issues like bias, discrimination, and threats to one's personal privacy. More and more individuals are needing to make data-driven decisions in a wide variety of contexts including non-governmental organizations, not-for-profit industries, human services, environmental organizations, refugee camps, and more. Students in this class will thus learn about data science and how it can be utilized in contexts where socially-good decisions are desired and emphasized. This active learning class is designed for students who have an interest in the topic but who may have little to no previous experience with data science or programming. |
Most of web data today consists of unstructured text. This course will cover the fundamental knowledge necessary to organize such texts, search them a meaningful way, and extract relevant information from them. This course will teach natural language processing through the design and development of end-to-end natural language understanding applications, including sentiment analysis (e.g., is this review positive or negative?), information extraction (e.g., extracting named entities and their relations from text), and question answering (retrieving exact answers to natural language questions such as "What is the capital of France" from large document collections). We will use several natural language processing toolkits, such as NLTK and Stanford's CoreNLP. The main programming language used in the course will be Python, but code written in Java or Scala will be accepted as well. Graduate-level requirements include implementing more complex, state-of-the-art algorithms for the three proposed projects. This will require additional reading of conference papers and journal articles.
Most of the web data today consists of unstructured text. Of course, the fact that this data exists is irrelevant, unless it is made available such that users can quickly find information that is relevant for their needs. This course will cover the fundamental knowledge necessary to build such systems, such as web crawling, index construction and compression, boolean, vector-based, and probabilistic retrieval models, text classification and clustering, link analysis algorithms such as PageRank, and computational advertising. The students will also complete one programming project, in which they will construct one complex application that combines multiple algorithms into a system that solves real-world problems. Graduate level requirements include implementing more complex, state-of-the-art algorithms for the programming project, which might require additional reading of research articles. Written assignments will have additional questions for graduate students.
Neural networks are a branch of machine learning that combines a large number of simple computational units to allow computers to learn from and generalize over complex patterns in data. Students in this course will learn how to train and optimize feed forward, convolutional, and recurrent neural networks for tasks such as text classification, image recognition, and game playing.
This course covers theory, methods, and techniques widely used to design and develop a relational database system and students will develop a broad understanding of modern database management systems. Applications of fundamental database principles in a stand-alone database environment using MS Access and Windows are emphasized. Applications in an Internet environment will be discussed using MySQL in the Linux platform. Graduate-level requirements include a group project consisting of seven sections: Database Design; Implementation (Tables); Forms; Data Retrieval (Queries/Reports); Project Presentation; Project Report; and, Peer Evaluation.
In today's digital society, people have access to a wide variety of information sources and scientific data. In this course, students will learn about the role of science and scientific data in society, and they will consider means for making science information findable and understandable for a wide variety of audiences. This course will provide students an interdisciplinary experience for considering science data and how that information gets shared across contexts.
This course provides an overview of modern database systems at the time. Both relational databases (SQL) and a few non-relational databases (NoSQL) are covered, including topics on data warehouses. The focus of the course is on the practical skills of the design and implementation of data storage and access for data and information sciences. Topics covered include ER-diagrams, database normalization, data modeling in NoSQL databases, SQL and other query languages, and data warehousing. The course will selectively cover one or two types of NoSQL databases, for example, document-oriented, key-value pair, column-oriented, or graph databases. Database platforms used in this course could change with time, some examples include MySQL, PostgreSQL, Apache HBASE, Apache Cassandra, MongoDB, and Neo4J. |
Organizing information in electronic formats requires standard machine-readable languages. This course covers recent standards including XML (eXtensible Markup Language) and related technologies (XPath and XSLT) which are used widely in current information organization systems. Building on a sounding understanding of XML technologies, the course also introduces students to newer standards that support the development of the Semantic Web. These standards include RDF (Resource Description Framework), RDFS (RDF Schema), and OWL (Web Ontology Language) and their application under the Linked Data paradigm. While the application of many specific XML schemas used in libraries and other information setting such as science and business will be used to provide the context for various topics, the main focus of the course is on understanding the concepts of XML and Semantic Web technologies and on applying practical skills in various settings, including but not limiting to libraries. The course is heavy with hands-on assignments and requires students complete a final group project.
This course introduces the key concepts underlying statistical natural language processing. Students will learn a variety of techniques for the computational modeling of natural language, including: n-gram models, smoothing, Hidden Markov models, Bayesian Inference, Expectation Maximization, Viterbi, Inside-Outside Algorithm for Probabilistic Context-Free Grammars, and higher-order language models. Graduate-level requirements include assignments of greater scope than undergraduate assignments. In addition to being more in-depth, graduate assignments are typically longer and additional readings are required.
Topics include speech synthesis, speech recognition, and other speech technologies. This course gives students background for a career in the speech technology industry. Graduate students will do extra readings, extra assignments, and have an extra presentation. Their final project must constitute original work in a speech technology.
This course provides a hands-on project-based approach to particular problems and issues in computational linguistics. |
This course focuses on statistical approaches to pattern classification and applications of natural language processing to real-world problems |
Statistical methodology of estimation, testing hypotheses, goodness-of-fit, nonparametric methods and decision theory as it relates to engineering practice. Significant emphasis on the underlying statistical modeling and assumptions. Graduate-level requirements include additionally more difficult homework assignments. |
This course will provide senior undergraduate and graduate students from a diverse engineering disciplines with fundamental concepts, principles and tools to extract and generalize knowledge from data. Students will acquire an integrated set of skills spanning data processing, statistics and machine learning, along with a good understanding of the synthesis of these skills and their applications to solving problem. The course is composed of a systematic introduction of the fundamental topics of data science study, including: (1) principles of data processing and representation, (2) theoretical basis and advances in data science, (3) modeling and algorithms, and (4) evaluation mechanisms. The emphasis in the treatment of these topics will be given to the breadth, rather than the depth. Real-world engineering problems and data will be used as examples to illustrate and demonstrate the advantages and disadvantages of different algorithms and compare their effectiveness as well as efficiency, and help students to understand and identify the circumstances under which the algorithms are most appropriate.
Unconstrained and constrained optimization problems from a numerical standpoint. Topics include variable metric methods, optimality conditions, quadratic programming, penalty and barrier function methods, interior point methods, successive quadratic programming methods. |
Decomposition-coordination algorithms for large-scale mathematical programming. Methods include generalized Benders decomposition, resource and price directive methods, subgradient optimization, and descent methods of nondifferentiable optimization. Application of these methods to stochastic programming will be emphasized.
This course is devoted to structure and properties of practical algorithms for unconstrained and constrained nonlinear optimization. |