- Joint Analysis of Text and Behavior Data +
- Task-based Online Interactive Decision Optimization +
- Collaborative Sensing: An Approach for Immediately Scalable Sensing in Buildings +
- Cyber Physical Mappings - Empower Building Analytics at Scale +
- Privacy-Preserving Personalization +
Research Description: Humans are both producers and consumers of Big Data. Modeling the role of humans in the process of mining Big Data is thus critical for unleashing the vast potential of mined knowledge in various important domains such as health, education, security, and scientific discovery. The objective of this research is to build a human-centric learning framework, which harnesses the power of human-generated Big Data with computational models.
We believe that the human-generated natural language text data and interactive behavior data are not random nor isolated, but governed by their underlying intentions. In this research, we develop probabilistic generative models to jointly model text data and various types of behavior data of humans. In addition, we further enhance our assumption as: at micro-level, humans' behaviors are governed by their own intentions; but at macro-level, they are influenced by each other, i.e., social dynamics and social norms. We place our analyses under the context of social influence and explicitly model the interaction among humans during the data generation process. Figure 1 illustrates the basic idea of this research.
Figure 1. Human-centric knowledge discovery and decision optimization. In this loop, improved systems' utilities can be produced by in-depth understanding of humans (i.e., the flow from humans to systems); and optimized humans' decision making can be realized by customized knowledge services (i.e., the flow from systems to humans).
Team Members: Lin Gong, Lu Lin, Yuling Shi, and Hongning Wang
Collaborators: Md Mustafizur Rahman, Renqin Cai, Jiachuan Deng, Zuo Li and Benjamin Haines
Research Description: Traditional static, ad-hoc and passive machine-human interactions are inadequate to optimize humans' dynamic decision making processes. To address this limitation, users' longitudinal information seeking activities are organized into tasks, where new online learning algorithms are applied to proactively infer users' intents and adapt the systems for long-term utility optimization.
The assumption about the non-stochastic nature of human behaviors can be further enhanced when we take a longitudinal view: they are executed in such an order to fulfill the underlying goal; and such behaviors might be systematically affected by a system's output during the interaction. In this research, we first organize users' interactive activities into tasks, and then perform online optimization of a system's decision support to achieve long-term optimality for both humans and systems. In particular, we aim at developing a set of algorithmic solutions to perform online learning in a collaborative environment, where the feedback to the learning agents is assumed to be interrelated, noisy and with non-negligible cost. Figure 2 illistrates the idea of interactive online learning with user.
Figure 2. Task-based interactive online learning with users. The learning agent takes real-time feedback from user to update its interaction strategy for long-term optimality.
Team Members Qingyun Wu, Huazheng Wang and Hongning Wang
Collaborators: Quanquan Gu
Research Description: Buildings are complex Cyber-Physical Systems with profound impact on human health, productivity, comfort, and energy consumption. Smart building technology promises to improve many aspects of building operation by collecting and analyzing sensor data to support informed and precise building operation. However, adoption of smart building applications is inhibited by the fact that new sensors must be installed in every building.
This research aims to improve the scalability of smart building applications by developing new techniques called collaborative sensing: estimating the sensor data of one building based on sensor data collected in other buildings. The basic premise is that common design and construction patterns for buildings create a repeating structure in their sensor data. Thus, a sparse sensing basis can be used to represent sensor data from a broad range of buildings. A model of a building can be constructed from this sensing basis using only a small amount of data, such as utility meter readings, climate zone, and square footage. This low-dimensionality model can then be used to reconstruct sensor data for the building based on high-fidelity data collected in other buildings. This approach aims to create a shift to a new paradigm in which smart building functionality can be applied to new buildings without the need to install specialized instrumentation. Preliminary testing using publicly available sub-metering data from 100's of buildings indicate that this approach is not only more scalable but also sometimes more accurate than state-of-the-art alternatives. Figure 4 shows the overall idea of collaborative sensing.
Figure 3. Collaborative sensing: produce an energy breakdown for a target building based on sub-metering data collected in other similar buildings.
Team Members: Nipun Batra, Yiling Jia, Hongning Wang and Kamin Whitehouse
Collaborators: Kamin Whitehouse
Research Description: Buildings are complex Cyber-Physical Systems that have profound impact on human health, productivity, comfort, and energy consumption. For example, building operation is the single largest energy consumer in the US, accounting for 70% of electricity consumption and 40% of total energy consumption. Allergens, noise levels, and the availability of sunlight affect health and well-being, especially given that on average Americans spend 90% of their time in buildings. Indoor conditions such as thermal comfort and CO2 and pollutant concentrations have been shown to affect human productivity by 8-11%, which has an important effect on the national economy.
The performance of the nation's buildings can be significantly improved with analytics engines that collect and analyze data from the thousands of sensing and control points that already exist within a typical building. However, data alone does not inherently have any meaning, and so a person must manually provide the context – also called metadata – about every sensor and controller so that the analytics engine can interpret the data. This costly manual process can take days or weeks for a single building and is a major obstacle for applying building analytics to a large number of buildings.
The goal of this project is to create a suite of automated cyber-physical mapping solutions that will allow an advanced analytics engine to quickly connect to and analyze the data from a new and unfamiliar building. To achieve this, the it will produce new techniques for metadata inference that automatically infer the metadata of data stream, such as the type of sensor or controller that produced the data and its relation to other sensors, equipment, or rooms in the building. This approach is based on the hypothesis that time series data in buildings is sparse due to the fact that many points have common context such as weather patterns or equipment operation patterns. The contextual information itself is also very structured due to the fact that common design practices create repeated patterns in buildings around the world, such as spatial layout and mechanical equipment. Metadata inference exploits this sparsity and structure to quickly and easily create new metadata values for a large number of sensing and control points. If successful, these techniques will generalize to other types of Cyber-Physical Systems such as human health monitoring, infrastructure monitoring, or smart transportation systems where structure and data sparsity can be used to automatically infer the physical context of a sensor or controller.
Figure 4. Metadata Inference draws on three data sources: 1) point names (center), 2) raw time series data (center), and 3) search and access patterns (left). The goal is to create new metadata information for each point (right). Values, such as building name, sensor type, and relationship between sensors, such as sharing the same control equipmentp, can be automatically inferred by matching structure in the time series data with structure in the point names. New values such as room orientation or the usefulness of a point can be inferred by matching structure in the time series with structure in the maintenance logs.
Team Members Dezhi Hong, Renqin Cai, Kamin Whitehouse and Hongning Wang
Collaborators: Trane Inc.
Sponsors: NSF - III: Small: Cyber Physical Mappings - Empower Building Analytics at Scale, and the U.S. Department of Energy - The Building Adapter: Automatic Mapping of Commercial Buildings for Scalable Building Analytics.
Research Description: Most existing efforts in online privacy protection have been focused on protecting the identifiability aspect of privacy (i.e., who issued the inquiry) by using secure communication and encrypted storage in security community. This research concerns another important aspect of privacy: linkability. Linkability is what enables a provider to link multiple queries to the same user, and thereby learn detailed profiles about a user's interests and behaviors. A system with statistical inference mechanisms can largely decipher users' identities and their underlying intentions from simple observations of their information seeking behaviors. This leaves users with little control to avoid "curious" systems abusing their personal information, e.g., digital discrimination, or even to be aware of the abuse, given such information is already on the server side.
In this research, algorithmic solutions will be developed to protect the linkability aspect of privacy based on our adversarial assumption motivated from an economics perspective. On the user side, personalization is achieved by modeling each user as a probabilistic mixture over a set of latent groups to facilitate customizing the results. On the server side, without other external knowledge, a service provider can at most recognize a user's underlying intents at the group-level from its distorted observations of users' queries and clicks. By modeling users' long-term interactive behaviors as tasks and applying the proposed generative model in an online fashion, balance between privacy and personalization can be dynamically adjusted with respect to their sequential feedback in the long run. A testbed based on public data sets and new evaluation metrics will be developed to assess the practical value of the developed algorithms for privacy-preserved personalization. Figure 3 demonstrates the basic workflow of privacy protection developed in this research.
Figure 5. Privacy-preserving personalization: users' search behaviors are modeled with a generative mixture model on the client side, which obfuscates users' search intents in a probabilistic way.
Team Members Wasi Uddin Ahmad (UCLA), Kai-Wei Chang (UCLA), Puxuan Yu (Wuhan University) and Hongning Wang
Collaborators: Denis Nekipelov and David Evans
Sponsor: The University of Virginia School of Engineering and Applied Science Cyber-security Projects Initiative