Monday, October 09, 2017

Blade Runner 2049: Demis Hassabis (Deep Mind) interviews director Villeneuve

Hassabis refers to AI in the original Blade Runner, but it is apparent from the sequel that replicants are merely genetically engineered humans. AI appears in Blade Runner 2049 in the form of Joi. There seems to be widespread confusion, including in the movie itself, about whether to think about replicants as robots (i.e., hardware) with "artificial" brains, or simply superhumans engineered (by manipulation of DNA and memories) to serve as slaves. The latter, while potentially very alien psychologically (detectable by Voight-Kampff machine, etc.), presumably have souls like ours. (Hassabis refers to Rutger Hauer's decision to have Roy Batty release the dove when he dies as symbolic of Batty's soul escaping from his body.)

Dick himself seems a bit imprecise in his use of the term android (hardware or wet bioware?) in this context. "Electric" sheep? In a bioengineered android brain that is structurally almost identical to a normal human's?

Q&A at 27min is excellent -- concerning the dispute between Ridley Scott and Harrison Ford as to whether Deckard is a replicant, and how Villeneuve handled it, inspired by the original Dick novel.

Addendum: Blade Runner, meet Alien

The Tyrell-Weyland connection

Robots (David, of Alien Prometheus) vs Genetically Engineered Slaves (replicants) with false memories

Saturday, October 07, 2017

Information Theory of Deep Neural Nets: "Information Bottleneck"

This talk discusses, in terms of information theory, how the hidden layers of a deep neural net (thought of as a Markov chain) create a compressed (coarse grained) representation of the input information. To date the success of neural networks has been a mainly empirical phenomenon, lacking a theoretical framework that explains how and why they work so well.

At ~44min someone asks how networks "know" to construct (local) feature detectors in the first few layers. I'm not sure I followed Tishby's answer but it may be a consequence of the hierarchical structure of the data, not specific to the network or optimization.
Naftali (Tali) Tishby נפתלי תשבי

Physicist, professor of computer science and computational neuroscientist
The Ruth and Stan Flinkman professor of Brain Research
Benin school of Engineering and Computer Science
Edmond and Lilly Safra Center for Brain Sciences (ELSC)
Hebrew University of Jerusalem, 96906 Israel

I work at the interfaces between computer science, physics, and biology which provide some of the most challenging problems in today’s science and technology. We focus on organizing computational principles that govern information processing in biology, at all levels. To this end, we employ and develop methods that stem from statistical physics, information theory and computational learning theory, to analyze biological data and develop biologically inspired algorithms that can account for the observed performance of biological systems. We hope to find simple yet powerful computational mechanisms that may characterize evolved and adaptive systems, from the molecular level to the whole computational brain and interacting populations.
Another Tishby talk on this subject.

Tuesday, October 03, 2017

A Gentle Introduction to Neural Networks

"A gentle introduction to the principles behind neural networks, including backpropagation. Rated G for general audiences."

This very well done. If you have a quantitative background you can watch it at 1.5x or 2x speed, I think :-)

A bit more on the history of backpropagation and convexity: why is the error function convex, or nearly so?

Friday, September 29, 2017

The Vector Institute

I've waxed enthusiastic before about Thought Vectors:
... the space of concepts (primitives) used in human language (or equivalently, in human thought) ... has only ~1000 dimensions, and has some qualities similar to an actual vector space. Indeed, one can speak of some primitives being closer or further from others, leading to a notion of distance, and one can also rescale a vector to increase or decrease the intensity of meaning.

... we now have an automated method to extract an abstract representation of human thought from samples of ordinary language. This abstract representation will allow machines to improve dramatically in their ability to process language, dealing appropriately with semantics (i.e., meaning), which is represented geometrically.
Apparently I am not the only one (MIT Technology Review):
... The Vector Institute, this monument to the ascent of ­Hinton’s ideas, is a research center where companies from around the U.S. and Canada—like Google, and Uber, and Nvidia—will sponsor efforts to commercialize AI technologies. Money has poured in faster than Jacobs could ask for it; two of his cofounders surveyed companies in the Toronto area, and the demand for AI experts ended up being 10 times what Canada produces every year. Vector is in a sense ground zero for the now-worldwide attempt to mobilize around deep learning: to cash in on the technique, to teach it, to refine and apply it. Data centers are being built, towers are being filled with startups, a whole generation of students is going into the field.

... words that have similar meanings start showing up near one another in the space. That is, “insane” and “unhinged” will have coordinates close to each other, as will “three” and “seven,” and so on. What’s more, so-called vector arithmetic makes it possible to, say, subtract the vector for “France” from the vector for “Paris,” add the vector for “Italy,” and end up in the neighborhood of “Rome.” It works without anyone telling the network explicitly that Rome is to Italy as Paris is to France.

... Neural nets can be thought of as trying to take things—images, words, recordings of someone talking, medical data—and put them into what mathematicians call a high-dimensional vector space, where the closeness or distance of the things reflects some important feature of the actual world. Hinton believes this is what the brain itself does. “If you want to know what a thought is,” he says, “I can express it for you in a string of words. I can say ‘John thought, “Whoops.”’ But if you ask, ‘What is the thought? What does it mean for John to have that thought?’ It’s not that inside his head there’s an opening quote, and a ‘Whoops,’ and a closing quote, or even a cleaned-up version of that. Inside his head there’s some big pattern of neural activity.” Big patterns of neural activity, if you’re a mathematician, can be captured in a vector space, with each neuron’s activity corresponding to a number, and each number to a coordinate of a really big vector. In Hinton’s view, that’s what thought is: a dance of vectors.

... It is no coincidence that Toronto’s flagship AI institution was named for this fact. Hinton was the one who came up with the name Vector Institute.
See also Geoff Hinton on Deep Learning (discusses thought vectors).

Thursday, September 28, 2017

Feynman, Schwinger, and Psychometrics

Slate Star Codex has a new post entitled Against Individual IQ Worries.
I write a lot about the importance of IQ research, and I try to debunk pseudoscientific claims that IQ “isn’t real” or “doesn’t matter” or “just shows how well you do on a test”. IQ is one of the best-studied ideas in psychology, one of our best predictors of job performance, future income, and various other forms of success, etc.

But every so often, I get comments/emails saying something like “Help! I just took an IQ test and learned that my IQ is x! This is much lower than I thought, and so obviously I will be a failure in everything I do in life. Can you direct me to the best cliff to jump off of?”

So I want to clarify: IQ is very useful and powerful for research purposes. It’s not nearly as interesting for you personally.
I agree with Scott's point that while g is useful as a crude measurement of cognitive ability, and a statistical predictor of life outcomes, one is better off adopting the so-called growth mindset. ("Individuals who believe their talents can be developed through hard work, good strategies, and input from others have a growth mindset.")

Inevitably the question of Feynman's IQ came up in the discussion. I wrote to Scott about this (slightly edited):
Dear Scott,

I enjoyed your most recent SSC post and I agree with you that g is better applied at a statistical level (e.g., by the Army to place recruits) than at an individual level.

I notice Feynman came up again in the discussion. I have written more on this topic (and have done more research as well). My conclusions are as follows:

1. There is no doubt Feynman would have scored near the top of any math-loaded test (and he did -- e.g., the Putnam).

2. I doubt Feynman would have scored near the ceiling on many verbally loaded tests. He often made grammatical mistakes, spelling mistakes (even of words commonly used in physics), etc. He occasionally did not know the *meanings* of terms used by other people around him (even words commonly used in physics).

3. By contrast, his contemporary and rival Julian Schwinger wrote and spoke in elegant, impeccable language. People often said that Schwinger "spoke in entire paragraphs" that emerged well-formed from his mouth. My guess is that Schwinger was a more balanced type for that level of cognitive ability. Feynman was verbally creative, colorful, a master communicator, etc. But his score on the old SAT-V might not have been above top few percentile.

Part of the reason more people know about Feynman than Schwinger is not just that Feynman was more colorful and charismatic. In fact, very little that Schwinger ever said or wrote was comprehensible to people below a pretty high IQ threshold, whereas Feynman expressed himself simply and intuitively. I think this has a bit to do with their verbal IQs. Even really smart physics students have an easier time understanding Feynman's articles and lectures than Schwinger's!

Schwinger had read (and understood) all of the existing literature on quantum mechanics while still a HS student -- this loads on V, not just M. Feynman's development path was different, partially because he had trouble reading other people's papers.

Schwinger was one of the subjects in Anne Roe's study of top scientists. His verbal score was above +4 SD. I think it's extremely unlikely that Feynman would have scored that high.

See links below for more discussion, examples, etc.

Hope you are enjoying Berkeley!


Feynman's Cognitive Style

Feynman and the Secret of Magic

Feynman's War

Schwinger meets Rabi

Roe's Scientists

Here are some (accessible) Schwinger quotes I like.
The pressure for conformity is enormous. I have experienced it in editors’ rejection of submitted papers, based on venomous criticism of anonymous referees. The replacement of impartial reviewing by censorship will be the death of science.

Is the purpose of theoretical physics to be no more than a cataloging of all the things that can happen when particles interact with each other and separate? Or is it to be an understanding at a deeper level in which there are things that are not directly observable (as the underlying quantized fields are) but in terms of which we shall have a more fundamental understanding?

To me, the formalism of quantum mechanics is not just mathematics; rather it is a symbolic account of the realities of atomic measurements. That being so, no independent quantum theory of measurement is required -- it is part and parcel of the formalism.

[ ... recapitulates usual von Neumann formulation: unitary evolution of wavefunction under "normal" circumstances; non-unitary collapse due to measurement ... discusses paper hypothesizing stochastic (dynamical) wavefunction collapse ... ]

In my opinion, this is a desperate attempt to solve a non-existent problem, one that flows from a false premise, namely the vN dichotomization of quantum mechanics. Surely physicists can agree that a microscopic measurement is a physical process, to be described as would any physical process, that is distinguished only by the effective irreversibility produced by amplification to the macroscopic level. ...

(See Schwinger on Quantum Foundations ;-)
Schwinger survived both Feynman and Tomonaga, with whom he shared the Nobel prize for quantum electrodynamics. He began his eulogy for Feynman: "I am the last of the triumvirate ..."

Tuesday, September 26, 2017

The Vietnam War, Ken Burns and Lynn Novick

Ken Burns' Vietnam documentary is incredibly good. Possibly the best documentary I've ever seen. It's heartbreaking tragedy, with perspectives from all sides of the conflict: Americans and North and South Vietnamese, soldiers from both sides, war protestors, war planners, families of sons and daughters who died in the war.

I was a child when the war was winding down, so the America of the documentary is very familiar to me.

Here's the PBS web page from which you can stream all 18 hours. I have been watching the version that contains unedited explicit language and content (not broadcasted).

Tuesday, September 19, 2017

Accurate Genomic Prediction Of Human Height

I've been posting preprints on arXiv since its beginning ~25 years ago, and I like to share research results as soon as they are written up. Science functions best through open discussion of new results! After some internal deliberation, my research group decided to post our new paper on genomic prediction of human height on bioRxiv and arXiv.

But the preprint culture is nascent in many areas of science (e.g., biology), and it seems to me that some journals are not yet fully comfortable with the idea. I was pleasantly surprised to learn, just in the last day or two, that most journals now have official policies that allow online distribution of preprints prior to publication. (This has been the case in theoretical physics since before I entered the field!) Let's hope that progress continues.

The work presented below applies ideas from compressed sensing, L1 penalized regression, etc. to genomic prediction. We exploit the phase transition behavior of the LASSO algorithm to construct a good genomic predictor for human height. The results are significant for the following reasons:
We applied novel machine learning methods ("compressed sensing") to ~500k genomes from UK Biobank, resulting in an accurate predictor for human height which uses information from thousands of SNPs.

1. The actual heights of most individuals in our replication tests are within a few cm of their predicted height.

2. The variance captured by the predictor is similar to the estimated GCTA-GREML SNP heritability. Thus, our results resolve the missing heritability problem for common SNPs.

3. Out-of-sample validation on ARIC individuals (a US cohort) shows the predictor works on that population as well. The SNPs activated in the predictor overlap with previous GWAS hits from GIANT.
The scatterplot figure below gives an immediate feel for the accuracy of the predictor.
Accurate Genomic Prediction Of Human Height

Louis Lello, Steven G. Avery, Laurent Tellier, Ana I. Vazquez, Gustavo de los Campos, and Stephen D.H. Hsu

We construct genomic predictors for heritable and extremely complex human quantitative traits (height, heel bone density, and educational attainment) using modern methods in high dimensional statistics (i.e., machine learning). Replication tests show that these predictors capture, respectively, ∼40, 20, and 9 percent of total variance for the three traits. For example, predicted heights correlate ∼0.65 with actual height; actual heights of most individuals in validation samples are within a few cm of the prediction. The variance captured for height is comparable to the estimated SNP heritability from GCTA (GREML) analysis, and seems to be close to its asymptotic value (i.e., as sample size goes to infinity), suggesting that we have captured most of the heritability for the SNPs used. Thus, our results resolve the common SNP portion of the “missing heritability” problem – i.e., the gap between prediction R-squared and SNP heritability. The ∼20k activated SNPs in our height predictor reveal the genetic architecture of human height, at least for common SNPs. Our primary dataset is the UK Biobank cohort, comprised of almost 500k individual genotypes with multiple phenotypes. We also use other datasets and SNPs found in earlier GWAS for out-of-sample validation of our results.
This figure compares predicted and actual height on a validation set of 2000 individuals not used in training: males + females, actual heights (vertical axis) uncorrected for gender. For training we z-score by gender and age (due to Flynn Effect for height). We have also tested validity on a population of US individuals (i.e., out of sample; not from UKBB).

This figure illustrates the phase transition behavior at fixed sample size n and varying penalization lambda.

These are the SNPs activated in the predictor -- about 20k in total, uniformly distributed across all chromosomes; vertical axis is effect size of minor allele:

The big picture implication is that heritable complex traits controlled by thousands of genetic loci can, with enough data and analysis, be predicted from DNA. I expect that with good genotype | phenotype data from a million individuals we could achieve similar success with cognitive ability. We've also analyzed the sample size requirements for disease risk prediction, and they are similar (i.e., ~100 times sparsity of the effects vector; so ~100k cases + controls for a condition affected by ~1000 loci).

Note Added: Further comments in response to various questions about the paper.

1) We have tested the predictor on other ethnic groups and there is an (expected) decrease in correlation that is roughly proportional to the "genetic distance" between the test population and the white/British training population. This is likely due to different LD structure (SNP correlations) in different populations. A SNP which tags the true causal genetic variation in the Euro population may not be a good tag in, e.g., the Chinese population. We may report more on this in the future. Note, despite the reduction in power our predictor still captures more height variance than any other existing model for S. Asians, Chinese, Africans, etc.

2) We did not explore the biology of the activated SNPs because that is not our expertise. GWAS hits found by SSGAC, GIANT, etc. have already been connected to biological processes such as neuronal growth, bone development, etc. Plenty of follow up work remains to be done on the SNPs we discovered.

3) Our initial reduction of candidate SNPs to the top 50k or 100k is simply to save computational resources. The L1 algorithms can handle much larger values of p, but keeping all of those SNPs in the calculation is extremely expensive in CPU time, memory, etc. We tested computational cost vs benefit in improved prediction from including more (>100k) candidate SNPs in the initial cut but found it unfavorable. (Note, we also had a reasonable prior that ~10k SNPs would capture most of the predictive power.)

4) We will have more to say about nonlinear effects, additional out-of-sample tests, other phenotypes, etc. in future work.

5) Perhaps most importantly, we have a useful theoretical framework (compressed sensing) within which to think about complex trait prediction. We can make quantitative estimates for the sample size required to "solve" a particular trait.

I leave you with some remarks from Francis Crick:
Crick had to adjust from the "elegance and deep simplicity" of physics to the "elaborate chemical mechanisms that natural selection had evolved over billions of years." He described this transition as, "almost as if one had to be born again." According to Crick, the experience of learning physics had taught him something important — hubris — and the conviction that since physics was already a success, great advances should also be possible in other sciences such as biology. Crick felt that this attitude encouraged him to be more daring than typical biologists who tended to concern themselves with the daunting problems of biology and not the past successes of physics.

Friday, September 15, 2017

Phase Transitions and Genomic Prediction of Cognitive Ability

James Thompson (University College London) recently blogged about my prediction that with sample size of order a million genotypes|phenotypes, one could construct a good genomic predictor for cognitive ability and identify most of the associated common SNPs.
The Hsu Boundary

... The “Hsu boundary” is Steve Hsu’s estimate that a sample size of roughly 1 million people may be required to reliably identify the genetic signals of intelligence.

... the behaviour of an optimization algorithm involving a million variables can change suddenly as the amount of data available increases. We see this behavior in the case of Compressed Sensing applied to genomes, and it allows us to predict that something interesting will happen with complex traits like cognitive ability at a sample size of the order of a million individuals.

Machine learning is now providing new methods of data analysis, and this may eventually simplify the search for the genes which underpin intelligence.
There are many comments on Thompson's blog post, some of them confused. Comments from a user "Donoho-Student" are mostly correct -- he or she seems to understand the subject. (The phase transition discussed is related to the Donoho-Tanner phase transition. More from Igor Carron.)

The chain of logic leading to this prediction has been discussed here before. The excerpt below is from a 2013 post The human genome as a compressed sensor:

Compressed sensing (see also here) is a method for efficient solution of underdetermined linear systems: y = Ax + noise , using a form of penalized regression (L1 penalization, or LASSO). In the context of genomics, y is the phenotype, A is a matrix of genotypes, x a vector of effect sizes, and the noise is due to nonlinear gene-gene interactions and the effect of the environment. (Note the figure above, which I found on the web, uses different notation than the discussion here and the paper below.)

Let p be the number of variables (i.e., genetic loci = dimensionality of x), s the sparsity (number of variables or loci with nonzero effect on the phenotype = nonzero entries in x) and n the number of measurements of the phenotype (i.e., the number of individuals in the sample = dimensionality of y). Then  A  is an  n x p  dimensional matrix. Traditional statistical thinking suggests that  n > p  is required to fully reconstruct the solution  x  (i.e., reconstruct the effect sizes of each of the loci). But recent theorems in compressed sensing show that  n > C s log p  is sufficient if the matrix A has the right properties (is a good compressed sensor). These theorems guarantee that the performance of a compressed sensor is nearly optimal -- within an overall constant of what is possible if an oracle were to reveal in advance which  s  loci out of  p  have nonzero effect. In fact, one expects a phase transition in the behavior of the method as  n  crosses a critical threshold given by the inequality. In the good phase, full recovery of  x  is possible.

In the paper below, available on arxiv, we show that

1. Matrices of human SNP genotypes are good compressed sensors and are in the universality class of random matrices. The phase behavior is controlled by scaling variables such as  rho = s/n  and our simulation results predict the sample size threshold for future genomic analyses.

2. In applications with real data the phase transition can be detected from the behavior of the algorithm as the amount of data  n  is varied. A priori knowledge of  s  is not required; in fact one deduces the value of  s  this way.

3.  For heritability h2 = 0.5 and p ~ 1E06 SNPs, the value of  C log p  is ~ 30. For example, a trait which is controlled by s = 10k loci would require a sample size of n ~ 300k individuals to determine the (linear) genetic architecture.
For more posts on compressed sensing, L1-penalized optimization, etc. see here. Because s could be larger than 10k, the common SNP heritability of cognitive ability might be less than 0.5, and the phenotype measurements are noisy, and because a million is a nice round figure, I usually give that as my rough estimate of the critical sample size for good results. The estimate that s ~ 10k for cognitive ability and height originates here, but is now supported by other work: see, e.g., Estimation of genetic architecture for complex traits using GWAS data.

We have recently finished analyzing height using L1-penalization and the phase transition technique on a very large data set (many hundreds of thousands of individuals). The paper has been submitted for review, and the results support the claims made above with s ~ 10k, h2 ~ 0.5 for height.

Added: Here are comments from "Donoho-Student":
Donoho-Student says:
September 14, 2017 at 8:27 pm GMT • 100 Words

The Donoho-Tanner transition describes the noise-free (h2=1) case, which has a direct analog in the geometry of polytopes.

The n = 30s result from Hsu et al. (specifically the value of the coefficient, 30, when p is the appropriate number of SNPs on an array and h2 = 0.5) is obtained via simulation using actual genome matrices, and is original to them. (There is no simple formula that gives this number.) The D-T transition had only been established in the past for certain classes of matrices, like random matrices with specific distributions. Those results cannot be immediately applied to genomes.

The estimate that s is (order of magnitude) 10k is also a key input.

I think Hsu refers to n = 1 million instead of 30 * 10k = 300k because the effective SNP heritability of IQ might be less than h2 = 0.5 — there is noise in the phenotype measurement, etc.

Donoho-Student says:
September 15, 2017 at 11:27 am GMT • 200 Words

Lasso is a common statistical method but most people who use it are not familiar with the mathematical theorems from compressed sensing. These results give performance guarantees and describe phase transition behavior, but because they are rigorous theorems they only apply to specific classes of sensor matrices, such as simple random matrices. Genomes have correlation structure, so the theorems do not directly apply to the real world case of interest, as is often true.

What the Hsu paper shows is that the exact D-T phase transition appears in the noiseless (h2 = 1) problem using genome matrices, and a smoothed version appears in the problem with realistic h2. These are new results, as is the prediction for how much data is required to cross the boundary. I don’t think most gwas people are familiar with these results. If they did understand the results they would fund/design adequately powered studies capable of solving lots of complex phenotypes, medical conditions as well as IQ, that have significant h2.

Most people who use lasso, as opposed to people who prove theorems, are not even aware of the D-T transition. Even most people who prove theorems have followed the Candes-Tao line of attack (restricted isometry property) and don’t think much about D-T. Although D eventually proved some things about the phase transition using high dimensional geometry, it was initially discovered via simulation using simple random matrices.

Wednesday, September 13, 2017

"Helicopter parents produce bubble wrapped kids"

Heterodox Academy. In my opinion these are reasonable center-left (Haidt characterizes himself as "liberal left") people whose views would have been completely acceptable on campus just 10 or 20 years ago. Today they are under attack for standing up for freedom of speech and diversity of thought.

Sunday, September 10, 2017

Bannon Unleashed

[ These embedded clips were annoyingly set to autoplay, so I have removed them. ]

Most of this short segment was edited out of the long interview shown on 60 Minutes (see video below).

Bannon denounces racism and endorses Citizenism. See also The Bannon Channel.

Paraphrasing slightly:
Economic nationalism inclusive of all races, religions, and sexual preferences -- as long as you're a citizen of our country.

The smart Democrats are trying to get the identity politics out of the party. The winning strategy will be populism -- the only question is whether it will be a left-wing populism or right-wing populism. We'll see in 2020.
This is the longer interview, with no quarter given to a dumbfounded Charlie Rose:

[ Clip removed ]

This is a 1 hour video that aired on PBS. There are amazing details about the 2016 campaign from Bannon the deep insider. If you followed the election closely you will be very interested in this interview. (In case this video is taken down you might find the content here.)

Varieties of Snowflakes

I was pleasantly surprised that New Yorker editor David Remnick and Berkeley law professor Melissa Murray continue to support the First Amendment, even if some of her students do not. Remnick gives Historian Mark Bray (author of Antifa: The Anti-Fascist Handbook) a tough time about the role of violence in political movements.

After Charlottesville, the Limits of Free Speech

David Remnick speaks with the author of a new and sympathetic book about Antifa, a law professor at University of California, Berkeley, and a legal analyst for Slate, to look at how leftist protests at Berkeley, right-wing violence in Charlottesville, and open-carry laws around the country are testing the traditional liberal consensus on freedom of expression

Thursday, September 07, 2017

BENEFICIAL AI 2017 (Asilomar meeting)

AI researcher Yoshua Bengio gives a nice overview of recent progress in Deep Learning, and provides some perspective on challenges that must be overcome to achieve AGI (i.e., human-level general intelligence). I agree with Bengio that the goal is farther than the recent wave of excitement might lead one to believe.

There were many other interesting talks at the BENEFICIAL AI 2017 meeting held in Asilomar CA. (Some may remember the famous Asilomar meeting on recombinant DNA in 1975.)

Here's a panel discussion Creating Human-level AI: How and When?

If you like speculative discussion, this panel on Superintelligence should be of interest:

Tuesday, September 05, 2017

DeepMind and StarCraft II Learning Environment

This Learning Environment will enable researchers to attack the problem of building an AI that plays StarCraft II at a high level. As observed in the video, this infrastructure development required significant investment of resources by DeepMind / Alphabet. Now, researchers in academia and elsewhere have a platform from which to explore an important class of AI problems that are related to real world strategic planning. Although StarCraft is "just" a video game, it provides a rich virtual laboratory for machine learning.
StarCraft II: A New Challenge for Reinforcement Learning

This paper introduces SC2LE (StarCraft II Learning Environment), a reinforcement learning environment based on the StarCraft II game. This domain poses a new grand challenge for reinforcement learning, representing a more difficult class of problems than considered in most prior work. It is a multi-agent problem with multiple players interacting; there is imperfect information due to a partially observed map; it has a large action space involving the selection and control of hundreds of units; it has a large state space that must be observed solely from raw input feature planes; and it has delayed credit assignment requiring long-term strategies over thousands of steps. We describe the observation, action, and reward specification for the StarCraft II domain and provide an open source Python-based interface for communicating with the game engine. In addition to the main game maps, we provide a suite of mini-games focusing on different elements of StarCraft II gameplay. For the main game maps, we also provide an accompanying dataset of game replay data from human expert players. We give initial baseline results for neural networks trained from this data to predict game outcomes and player actions. Finally, we present initial baseline results for canonical deep reinforcement learning agents applied to the StarCraft II domain. On the mini-games, these agents learn to achieve a level of play that is comparable to a novice player. However, when trained on the main game, these agents are unable to make significant progress. Thus, SC2LE offers a new and challenging environment for exploring deep reinforcement learning algorithms and architectures.

Friday, September 01, 2017

Lax on vN: "He understood in an instant"

Mathematician Peter Lax (awarded National Medal of Science, Wolf and Abel prizes), interviewed about his work on the Manhattan Project. His comments on von Neumann and Feynman:
Lax: ... Von Neumann was very deeply involved in Los Alamos. He realized that computers would be needed to carry out the calculations needed. So that was, I think, his initial impulse in developing computers. Of course, he realized that computing would be important for every highly technical project, not just atomic energy. He was the most remarkable man. I’m always utterly surprised that his name is not common, household.

It is a name that should be known to every American—in fact, every person in the world, just as the name of Einstein is. I am always utterly surprised how come he’s almost totally unknown. ... All people who had met him and interacted with him realized that his brain was more powerful than anyone’s they have ever encountered. I remember Hans Bethe even said, only half in jest, that von Neumann’s brain was a new development of the human brain. Only a slight exaggeration.

... People today have a hard time to imagine how brilliant von Neumann was. If you talked to him, after three words, he took over. He understood in an instant what the problem was and had ideas. Everybody wanted to talk to him.


Kelly: I think another person that you mention is Richard Feynman?

Lax: Yes, yes, he was perhaps the most brilliant of the people there. He was also somewhat eccentric. He played the bongo drums. But everybody admired his brilliance. [ vN was a consultant and only visited Los Alamos occasionally. ]
Full transcript. See also Another species, an evolution beyond man.

Wednesday, August 30, 2017

Normies Lament

This interview with the Irish Times (not Ezra Klein) is much better than the one I originally linked to below.


Ezra Klein talks to Angela Nagle. It's still normie normative, but Nagle has at least done some homework.

Click the link below to hear the podcast.
From 4Chan to Charlottesville: where the alt-right came from, and where it's going

Angela Nagle spent the better part of the past decade in the darkest corners of the internet, learning how online subcultures emerge and thrive on forums like 4chan and Tumblr.

The result is her fantastic new book, Kill All the Normies: Online Culture Wars From 4Chan And Tumblr to Trump and the Alt-Right, a comprehensive exploration of the origins of our current political moment.

We talk about the origins of the alt-right, and how the movement morphed from transgressive aesthetics on the internet to the violence in Charlottesville, but we also discuss PC culture on the left, demographic change in America, and the toxicity of online politics in general. Nagle is particularly interested in how the left's policing of language radicalizes its victims and creates space for alt-right groups to find eager recruits, and so we dive deep into that.


Civilization and Its Discontents by Sigmund Freud

This Is Why We Can't Have Nice Things: Mapping the Relationship between Online Trolling and Mainstream Culture by Whitney Phillips

The Net Delusion: The Dark Side of Internet Freedom by Evgeny Morozov

Friday, August 25, 2017

Job Opening in Computational Genomics

A VC-funded genomics startup I am familiar with is searching for someone to apply computational methods to complex human traits (e.g., polygenic disease risk).

The ideal candidate would be someone from Physics or CS or other quantitative discipline, interested in computational genomics and data science. Strong background in computation required. Advanced degree a plus, but not required.

Location is in NJ, just outside NYC.

Send your resume to

Sunday, August 20, 2017

Ninety-nine genetic loci influencing general cognitive function

The paper below has something like 200 authors from over 100 institutions worldwide.

Many people claimed just a few years ago (or more recently!) that results like this were impossible. Will they admit their mistake?

In Scientific Consensus on Cognitive Ability? I described the current consensus among experts as follows.
0. Intelligence is (at least crudely) measurable
1. Intelligence is highly heritable (much of the variance is determined by DNA)
2. Intelligence is highly polygenic (controlled by many genetic variants, each of small effect)
3. Intelligence is going to be deciphered at the molecular level, in the near future, by genomic studies with very large sample size
See figures below for a summary of progress over the last six years. Note 4% of total variance = 1/25 and sqrt(1/25) = 1/5, so a predictor built from these variants would correlate ~0.2 with actual cognitive ability. There is still much more variance to be discovered with larger samples, of course.
Ninety-nine independent genetic loci influencing general cognitive function include genes associated with brain health and structure (N = 280,360)

General cognitive function is a prominent human trait associated with many important life outcomes including longevity. The substantial heritability of general cognitive function is known to be polygenic, but it has had little explication in terms of the contributing genetic variants. Here, we combined cognitive and genetic data from the CHARGE and COGENT consortia, and UK Biobank (total N=280,360). We found 9,714 genome-wide significant SNPs in 99 independent loci. Most showed clear evidence of functional importance. Among many novel genes associated with general cognitive function were SGCZ, ATXN1, MAPT, AUTS2, and P2RY6. Within the novel genetic loci were variants associated with neurodegenerative disorders, neurodevelopmental disorders, physical and psychiatric illnesses, brain structure, and BMI. Gene-based analyses found 536 genes significantly associated with general cognitive function; many were highly expressed in the brain, and associated with neurogenesis and dendrite gene sets. Genetic association results predicted up to 4% of general cognitive function variance in independent samples. There was significant genetic overlap between general cognitive function and information processing speed, as well as many health variables including longevity.

Chinese Social Media Notices US Cultural Revolution

The joke below is making the rounds on Chinese social media.

See Struggles at Yale and Baizuo = Libtard.

Also circulating on Chinese social media: A Report on the Cultural Revolution in the United States.

Yes, an entire country can go crazy for a decade...
Cultural Revolution (Wikipedia): The Cultural Revolution, formally the Great Proletarian Cultural Revolution, was a sociopolitical movement that took place in China from 1966 until 1976. Set into motion by Mao Zedong, then Chairman of the Communist Party of China, its stated goal was to preserve 'true' Communist ideology in the country by purging remnants of capitalist and traditional elements from Chinese society, and to re-impose Maoist thought as the dominant ideology within the Party. ...

The movement paralyzed China politically and negatively affected the country's economy and society to a significant degree. ...

Libraries full of historical and foreign texts were destroyed; books were burned. Temples, churches, mosques, monasteries, and cemeteries were closed down ...

The Bannon Channel

Rumor has it that Bannon will start a Breitbart TV channel to rival Fox News. Given the success of YouTube- / pod-casters like Joe Rogan (5 million downloads per episode), it's plausible this could be done with very modest capex (the channel could start out as pure streaming and only go to cable later). Billionaire Robert Mercer (Renaissance Technologies) is a likely backer.

(The headline above actually appeared on the front page of the Huffington Post when it was announced that Bannon would leave the White House. It was quickly replaced with the headline White Flight -- swing state voters and 2018 midterm election voters will not forget either headline, I predict.)

Bannon's new channel can denounce Richard Spencer, Nazis, KKK, etc. but still push economic nationalism and even mild white identity slogans like  white people have rights, too  or  everyone should be treated as an individual, based on merit ... (gee, that last one seems pretty principled... maybe they could add something catchy like the content of their character or something like that).

They can keep winning on a populist platform with just 4 messages:

1. Economic Nationalism
2. No foreign wars
3. Reform immigration
4. Stop PC excesses

Again, not really to the right of Fox, but just more consistent and less corporate GOP ("cuck") content. Tucker Carlson and Hannity could move to the new network without changing their schtick in the slightest.

Mercer would be smart to back this based entirely on financial ROI. There is massive pent-up demand: Trump got about half the popular vote, but currently no major media outlet is aligned with the views of his supporters.

Friday, August 18, 2017

Embryo Selection in China (Nature)

China’s embrace of embryo selection raises thorny questions (Nature)

Fertility centres are making a massive push to increase preimplantation genetic diagnosis in a bid to eradicate certain diseases.

... Early experiments are beginning to show how genome-editing technologies such as CRISPR might one day fix disease-causing mutations before embryos are implanted. But refining the techniques and getting regulatory approval will take years. PGD has already helped thousands of couples. And whereas the expansion of PGD around the world has generally been slow, in China, it is starting to explode.

... Genetic screening during pregnancy for chromosomal abnormalities linked to maternal age has taken off throughout the country, and many see this as a precursor to wider adoption of PGD.

Although Chinese fertility doctors were late to the game in adopting the procedure, they have been pursuing a more aggressive, comprehensive and systematic path towards its use there than anywhere else. The country’s central government, known for its long-term thinking, has over the past decade stepped up efforts to bring high-quality health care to the people, and its current 5-year plan has made reproductive medicine, including PGD, a priority ...

Comprehensive figures are difficult to come by, but estimates from leading PGD providers show that China’s use of the technique already outpaces that in the United States, and it is growing up to five times faster.

... there are concerns about the push to select for non-disease-related traits, such as intelligence or athletic ability. The ever-present spectre of eugenics lurks in the shadows. But in China, although these concerns are considered, most thoughts are focused on the benefits of the procedures.

... And the centres with licences to do PGD have created a buzz in their race to claim firsts with the technology. In 2015, CITIC-Xiangya boasted China’s first “cancer-free baby”. The boy’s parents had terminated a prior pregnancy after genetic testing showed the presence of retinoblastoma, a cancer that forms in the eyes during early development and often leads to blindness. In their next try, the couple used PGD to ensure that the gene variant that causes retinoblastoma wasn’t present. Other groups have helped couples to avoid passing on a slew of conditions: short-rib-polydactyly syndrome, Brittle-bone disease, Huntington’s disease, polycystic kidney disease and deafness, among others. ...

Joe Leigh Simpson, a medical geneticist at Florida International University in Miami, and former president of the Preimplantation Genetic Diagnosis International Society, is impressed by the quality and size of the Chinese fertility clinics. They “are superb and have gigantic units. They came out of nowhere in just 2 or 3 years,” he says. ...

People in China seem more likely to feel an obligation to bear the healthiest child possible than to protect an embryo. The Chinese appetite for using genetic technology to ensure healthy births can be seen in the rapid rise of pregnancy testing for Down’s syndrome and other chromosomal abnormalities. Since Shenzhen-based BGI introduced a test for Down’s syndrome in 2013, it has sold more than 2 million kits; half of those sales were in the past year.

... The Chinese word for eugenics, yousheng, is used explicitly as a positive in almost all conversations about PGD. Yousheng is about giving birth to children of better quality. Not smoking during pregnancy is also part of yousheng. ...
优生学 加油!

Wednesday, August 16, 2017

Meet the Bot: OpenAI and Dota 2

OpenAI has created a Dota 2 bot that plays at the level of human professionals. Humans can look forward to coexistence with increasingly clever AIs in both virtual and real world settings. See also Robots taking our jobs.
OpenAI: Dota 1v1 is a complex game with hidden information. Agents must learn to plan, attack, trick, and deceive their opponents. The correlation between player skill and actions-per-minute is not strong, and in fact, our AI’s actions-per-minute are comparable to that of an average human player.

Success in Dota requires players to develop intuitions about their opponents and plan accordingly. In the above video you can see that our bot has learned — entirely via self-play — to predict where other players will move, to improvise in response to unfamiliar situations, and how to influence the other player’s allied units to help it succeed.
About the game ("Defense of the Ancient").
Wikipedia: Dota 2 is played in matches between two teams of five players, with each team occupying and defending their own separate base on the map. Each of the ten players independently controls a powerful character, known as a "hero", who all have unique abilities and styles of play. During a match, a player and their team collects experience points and items for their heroes in order to fight through the opposing team's heroes and other defenses. A team wins by being the first to destroy a large structure located in the opposing team's base, called the "Ancient".
Related: this is a nice recent interview with Demis Hassabis of Deep Mind. He talks a bit about Go innovation resulting from AlphaGo.

Monday, August 14, 2017

Estimation of genetic architecture for complex traits using GWAS data

These authors extrapolate from existing data to predict sample sizes needed to identify SNPs which explain a large portion of heritability in a variety of traits. For cognitive ability (see red curves in figure below), they predict sample sizes of ~million individuals will suffice.

See also More Shock and Awe: James Lee and SSGAC in Oslo, 600 SNP hits.
Estimation of complex effect-size distributions using summary-level statistics from genome-wide association studies across 32 complex traits and implications for the future

Yan Zhang, Guanghao Qi, Ju-Hyun Park, Nilanjan Chatterjee (Johns Hopkins University)

Summary-level statistics from genome-wide association studies are now widely used to estimate heritability and co-heritability of traits using the popular linkage-disequilibrium-score (LD-score) regression method. We develop a likelihood-based approach for analyzing summary-level statistics and external LD information to estimate common variants effect-size distributions, characterized by proportion of underlying susceptibility SNPs and a flexible normal-mixture model for their effects. Analysis of summary-level results across 32 GWAS reveals that while all traits are highly polygenic, there is wide diversity in the degrees of polygenicity. The effect-size distributions for susceptibility SNPs could be adequately modeled by a single normal distribution for traits related to mental health and ability and by a mixture of two normal distributions for all other traits. Among quantitative traits, we predict the sample sizes needed to identify SNPs which explain 80% of GWAS heritability to be between 300K-500K for some of the early growth traits, between 1-2 million for some anthropometric and cholesterol traits and multiple millions for body mass index and some others. The corresponding predictions for disease traits are between 200K-400K for inflammatory bowel diseases, close to one million for a variety of adult onset chronic diseases and between 1-2 million for psychiatric diseases.

This figure shows predicted effect size distributions for a number of quantitative traits. You can see that height and intelligence are somewhat different, but dramatically so.

Thursday, August 10, 2017

Meanwhile, down on the Farm

The Spring 2017 issue of the Stanford Medical School magazine has a special theme: Sex, Gender, and Medicine. I recommend the article excerpted below to journalists covering the Google Manifesto / James Damore firing. After reading it, they can decide for themselves whether his memo is based on established neuroscience or bro-pseudoscience.

Perhaps top Google executives will want to head down the road to Stanford for a refresher course in reality.

Stanford Neuroscience Professor Nirao Shah and Diane Halpern, past president of the American Psychological Association, would both make excellent expert witnesses in the Trial of the Century.
Two minds: The cognitive differences between men and women

... Nirao Shah decided in 1998 to study sex-based differences in the brain ... “I wanted to find and explore neural circuits that regulate specific behaviors,” says Shah, then a newly minted Caltech PhD who was beginning a postdoctoral fellowship at Columbia. So, he zeroed in on sex-associated behavioral differences in mating, parenting and aggression.

“These behaviors are essential for survival and propagation,” says Shah, MD, PhD, now a Stanford professor of psychiatry and behavioral sciences and of neurobiology. “They’re innate rather than learned — at least in animals — so the circuitry involved ought to be developmentally hard-wired into the brain. These circuits should differ depending on which sex you’re looking at.”

His plan was to learn what he could about the activity of genes tied to behaviors that differ between the sexes, then use that knowledge to help identify the neuronal circuits — clusters of nerve cells in close communication with one another — underlying those behaviors.

At the time, this was not a universally popular idea. The neuroscience community had largely considered any observed sex-associated differences in cognition and behavior in humans to be due to the effects of cultural influences. Animal researchers, for their part, seldom even bothered to use female rodents in their experiments, figuring that the cyclical variations in their reproductive hormones would introduce confounding variability into the search for fundamental neurological insights.

But over the past 15 years or so, there’s been a sea change as new technologies have generated a growing pile of evidence that there are inherent differences in how men’s and women’s brains are wired and how they work.

... There was too much data pointing to the biological basis of sex-based cognitive differences to ignore, Halpern says. For one thing, the animal-research findings resonated with sex-based differences ascribed to people. These findings continue to accrue. In a study of 34 rhesus monkeys, for example, males strongly preferred toys with wheels over plush toys, whereas females found plush toys likable. It would be tough to argue that the monkeys’ parents bought them sex-typed toys or that simian society encourages its male offspring to play more with trucks. A much more recent study established that boys and girls 9 to 17 months old — an age when children show few if any signs of recognizing either their own or other children’s sex — nonetheless show marked differences in their preference for stereotypically male versus stereotypically female toys.

Halpern and others have cataloged plenty of human behavioral differences. “These findings have all been replicated,” she says.

... “You see sex differences in spatial-visualization ability in 2- and 3-month-old infants,” Halpern says. Infant girls respond more readily to faces and begin talking earlier. Boys react earlier in infancy to experimentally induced perceptual discrepancies in their visual environment. In adulthood, women remain more oriented to faces, men to things.

All these measured differences are averages derived from pooling widely varying individual results. While statistically significant, the differences tend not to be gigantic. They are most noticeable at the extremes of a bell curve, rather than in the middle, where most people cluster. ...

See also Gender differences in preferences, choices, and outcomes: SMPY longitudinal study. These preference asymmetries are not necessarily determined by biology. They could be entirely due to societal influences. But nevertheless, they characterize the pool of human capital from which Google is trying to hire.
The recent SMPY paper below describes a group of mathematically gifted (top 1% ability) individuals who have been followed for 40 years. This is precisely the pool from which one would hope to draw STEM and technological leadership talent. There are 1037 men and 613 women in the study.

The figures show significant gender differences in life and career preferences, which affect choices and outcomes even after ability is controlled for. (Click for larger versions.) According to the results, SMPY men are more concerned with money, prestige, success, creating or inventing something with impact, etc. SMPY women prefer time and work flexibility, want to give back to the community, and are less comfortable advocating unpopular ideas. Some of these asymmetries are at the 0.5 SD level or greater. Here are three survey items with a ~ 0.4 SD or more asymmetry:

# Society should invest in my ideas because they are more important than those of other people.

# Discomforting others does not deter me from stating the facts.

# Receiving criticism from others does not inhibit me from expressing my thoughts.

I would guess that Silicon Valley entrepreneurs and leading technologists are typically about +2 SD on each of these items! One can directly estimate M/F ratios from these parameters ...
For example, if a typical male SV entrepreneur / tech leader is roughly +2SD on these traits whereas a female is +2.5SD, the population fraction would be 3:1 or 4:1 larger for males. This doesn't mean that the females who are > +2.5SD (in the female population) are ill-suited to the role (they may be as good as the men), just that there are fewer of them in the general population. I was shocked to see that even top Google leadership didn't understand this point that Damore tried to make in his memo.

A 6ft3 Asian-American guard (Jeremy Lin) might be just as good as other guards in the NBA, but the fraction of Asian-American males who are 6ft3 is smaller than for other groups, like African-Americans. Even if there were no discrimination against Asian players, you'd expect to see fewer (relative to base population) in the NBA due to the average height difference.

Behold the Brogrammer: James Damore (Bloomberg video)

Watch a few minutes of this Bloomberg interview and I think you'll agree he's both sincere and well-meaning, if a bit naive about the buzzsaw he has stepped into. Definitely not a brogrammer.

He reminds me of Richard Hendricks of the HBO show Silicon Valley.

See also Damore vs Google: Trial of the Century? and In the matter of James Damore, ex-Googler

Blog Archive