Tag: paper

NeurIPS: Data space as the dual to feature space

We use “data slices” to evaluate our cybersecurity ML systems for the asset attribution task at Palo Alto Networks. For us, data slices are the dual of feature explanations. By segmenting our data into subunits with known properties, we can verify that improvements to address model blindspots actually succeed, we can detect model regression, and we can characterize differences between models.

I described our approach in a paper on using data subsets to evaluate the ML internet asset attribution problem, which was accepted to the NeurIPS Data-Centric AI (DCAI) Workshop held on 14 December 2021. The DCAI workshop focused on practical tooling, best practices, and infrastructure for data management in modern ML systems. The paper discusses two themes: (1) data slices, and (2) their application to our asset attribution task in cybersecurity.

End-to-end neural networks for subvocal speech recognition

My final project for Stanford CS 224S was on subvocal speech recognition. This was my last paper at Stanford; it draws on everything I learned in a whirlwind of CS grad school without a CS undergraduate major. Pol Rosello provided the topic; he and I contributed equally to the paper.

We describe the first approach toward end-to-end, session-independent subvocal automatic speech recognition from involuntary facial and laryngeal muscle movements detected by surface electromyography. We leverage character-level recurrent neural networks and the connectionist temporal classification loss (CTC). We attempt to address challenges posed by a lack of data, including poor generalization, through data augmentation of electromyographic signals, a specialized multi-modal architecture, and regularization. We show results indicating reasonable qualitative performance on test set utterances, and describe promising avenues for future work in this direction.

Automatically assessing Integrative Complexity

My final project for Stanford CS 224U was on automatically assessing integrative complexity. I drew on work I’d previously done that demonstrated ongoing value from this political psychology construct, but I had not previously tried to automatically code for this construct. The code is available on github.

Integrative complexity is a construct from political psychology that measures semantic complexity in discourse. Although this metric has been shown useful in predicting violence and understanding elections, it is very time-consuming for analysts to assess. We describe a theory-driven automated system that improves the state-of-the-art for this task from Pearson’s r = 0.57 to r = 0.73 through framing the task as ordinal regression, leveraging dense vector representations of words, and developing syntactic and semantic features that go beyond lexical phrase matching. Our approach is less labor-intensive and more transferable than the previous state-of-the-art for this task.

Modeling decision making for IARPA

IARPA, the whackier younger sibling of the government’s wild idea incubator DARPA, wanted to understand the hidden intentions of decision makers. To that end, we delivered a computational model of decision making in Java. By building a model, we enabled users to: (1) perturb the system to estimate the robustness of an outcome, (2) experiment with levers of influence. The model integrated with a complex set of other related systems built by other contractors.

I learned a lot of decision theory as part of implementing this model. My main takeaways are:

  • Individuals’ decisions can be represented as “decision matrices” (interests by available choices, with weights in each cell). We often want the interests to be ranked. Weighting choices ordinally with ties allowed within each interest is often easiest.
  • You can influence an outcome by influencing the decision-making environment. For instance, adding more interests can change outcomes. Changing who is consulted can also change the outcome.
  • Decision matrices can be turned into decisions through multiple techniques, including: (1) maximizing expected utility, (2) eliminating the worst options on the most important dimension (“elimination by aspects”), (3) choosing the best option on the most important dimension (“lexicographic decision heuristic”), (4) and maximin (maximizing the worst-case payoff).
  • The maximin heuristic is very risk averse. People tend to use it when the decision is especially difficult.
  • Social influence and power relationships can weaken/strengthen/change people’s interests and decision matrix weights.
  • The group’s style of decision making (a la the Vroom-Yetton model) informs how individual choices are aggregated.

Using Java professionally (and for the strange bedfellow of modeling) meant a lot of learning for me too. (Once burned by a boxed variable, never again.)

Using social network analysis to anticipate rare events

My boss Elisa Bienenstock and I wrote a white paper on how Social Network Analysis can help forecast and detect rare events. It appears in Anticipating Rare Events: Can Acts of Terror, Use of Weapons of Mass Destruction or Other High Profile Acts Be Anticipated? A Scientific Perspective on Problems, Pitfalls and Prospective Solutions (N. Chesser, Ed.), which is an interdisciplinary review for operators in terrorism prevention within DoD/DHS/USG agencies.

In our paper, we focus on two insights from the field of social network analysis (SNA). First, innovation tends to happen at the periphery of social networks, rather than in a network’s core. New behaviors, insights, and events are more likely to occur when people with different backgrounds mix. Second, when a novel event involving new participants is being planned, we can observe signs of that activity in the network. New regions of the network will become active and new substructures will emerge. We conclude that an SNA-based approach for anticipating terrorism and other rare events would watch for two changes in structure: (1) new ties at the edges of networks, and (2) the new involvement of individuals with particular talents and resources with each other.