Garage of Code: 2020

måndag 9 mars 2020

COVID19: How many unconfirmed cases are there?

We have data on the number of cases and what is the delay from onset to confirmation.
We use this to estimate a best case scenario for the number of unconfirmed cases.

Answer: best case in Europe and US is the total number of cases is 2 times the confirmed cases. So double the official figure. The situation is better in Asia: Japan and South Korea have a best case bound of about 10% extra.

More in depth info about the method below the graphs.

Belgium

France

Germany

Iran

Italy

Japan

China

Netherlands

Norway

Singapore

South Korea

Spain

Sweden

Switzerland

United Kingdom

United States

Method

Suppose that the pattern of delay from onset to confirmed infection with COVID19 is as in:

For each country, we have a time series of the number of confirmed cases, call these Y. The total number of cases with symptoms each day are our unknowns, call these X. For each day, there is one observation and one unknown. We also have extra unknowns in the start, because of the delay. So there are many sets of X which will fit the data Y.

How to choose between them? What I do here is to, instead of picking the model for X that I think is most likely, pick the model which gives the most actionable information. Actionable information in this case is that we know that at least so-and-so many are infected. If the best case is bad, and it is, then we know that it is not an option to choose inaction.

Choose the most optimistic X by penalizing all x values over 0 in the fit model. Implicitly, this means that we think that the most likely world is one where there are no new cases each day. This way, the data does the whole job of "lifting" up our estimation of X.

References

John Hopkins CSSE github

COVID19: How delayed are confirmed cases?

Answer: about a week.
Data from: over 90% Asia.

data: john hopkins via kaggle

code: get_delay2freq

COVID19: what is the growth rate?

Answer: the growth rate is about 1.3 per day for Europe and US. This means a doubling of the number of cases every 2-3 days. Slower for Japan and Singapore. South Korea also shows signs of slower growth. More in depth about the method below the graphs.

Method

data: john hopkins via kaggle as of 2020-03-07

Cases are reported daily.

The model applied is least squares on the logarithm of number of cases. The code:

The model is applied from the when there is at least 70 reported cases.

tisdag 3 mars 2020

Political intolerance and dating pt. 2: Chemistry

Part 1

Women lean left. Men lean right. Will this be a problem for future marriages in Sweden?

In the previous post, we applied a model from computer science to see if it would be possible to match together all young men and women in Sweden, without political awkwardness. Answer: almost. But love isn't computer science. Love is chemistry. So, let's make a chemical model instead.

The Idea

Forget political parties for a minute. The process by which (heterosexual) men and women form and break up relationships can be seen as two reactants forming a product molecule in a solution:

$A + B \rightleftharpoons AB$

The reaction goes both ways: in equilibrium new products are formed as fast as old ones are breaking up. The key variable in this process is the equilibrium constant*. The equilibrium constant is:

$K = \frac{[AB]}{[A][B]}$

The brackets mean concentrations. What value does K have in real life if A=single women, B=single men and AB=couple? Let's be fairly optimistic and say that with all other variables that affect this except politics, 80% of people will be in a long term relationship at equilibrium. About 80% of all women become mothers at some point during their life, so that's where I get that from. Then A=0.1, B=0.1, and AB=0.4 (since a couple counts as one in chemistry). Therefore, K=40.

*The speed by which the solution converges to the equilibrium is reached is also very important, but here we will just assume that all processes are in equilibrium (though of course life happens before the asymptote, quote Taleb).

The Model

We will use the same binary political tolerance model as before, so as to limit the number of assumptions, and make this model comparable to the previous one. The matrix of political tolerance for dating:

Binary tolerance model. Green: "fine, I can live with your views". Yellow: at least one partner says "we're going to have a problem here". Note that the model doesn't assume whether the intolerance goes both ways, as long as I can imagine at least one direction of intolerance. Therefore, the matrix is necessarily symmetric.

The motivation for using binary tolerance rather than ranges or rankings or something is that I want to limit the number of design parameters in order to limit the impact of my own biases here. The motivation for assuming intolerance between V and {C, L} is that L's election campaign in 2018 was "anti-extremist", which is crypto for working against both SD and V. So the people who voted L a year and a half ago apparently did not have a problem with that. And C is a lot like L.

Another motivation for using a binary tolerance model rather than something more granular is the assumption that value questions come up quite early during dating, and if the values do not match then chances of going forward are not good. If the values do approximately match however, then it should be possible to reconcile political difference in good faith, and other variables are more important to whether the relationship will hold.

For each politically feasible couple, we get a reaction formula, such as:

$C_w + S_m \rightleftharpoons C_w S_m$

for Center women and Social democrat men. This gives us the equilibrium equation:

$K[C_w][S_m] = [C_w S_m]$

Where K is the equilibrium constant, assuming no political troubles.
The variables in this model are the concentrations at equilibrium. So there will be one variable for each "single" type, such as Cw, and one for each "couple" type, such as CwSm. We need to preserve the total number of people of course, which gives us the equations like:

$[C_w] + [C_w MP_m] + ... + [C_w KD_m] = P(C_w)$

Where P is the total concentration of that group, out of both men and women. In the case of Center women, P(Cw) = 16.8% / 2 = 8.4%.

Now we have one equation per single type, and one per couple type, which means one per variable, so the model should be fully constrained. Might exist some satisfying solutions involving negative concentrations, but we can easily constrain concentrations to be in the interval [0, 1].

This can be implemented as a quadratically constrained program (QCP) with a trivial objective (no optimization, we're just looking for a solution).

The model solves beautifully with Couenne [1][2] in about 100 ms.

Results

So, what does the equilibrium look like? Starting with a non-political match rate of 80% we get:

Equilibrium matching pattern using the "chemical" model. Line width in the middle is proportional to the commonness of that couple. Singles are not shown.

So the political intolerance increases the number of equilibrium singles from 20% to 31.2%. Who gets left out?

V, women   57.1%
MP, women  41.8%
S, women   26.4%
C, women   26.0%
L, women   26.0%
M, women   14.8%
KD, women  15.4%
SD, women  17.0%

V, men     19.3%
MP, men    15.9%
S, men     14.8%
C, men     24.7%
L, men     24.7%
M, men     24.7%
KD, men    32.5%
SD, men    50.8%

Discussion

Wow! Let's analyze. Remember the baseline number of singles is 20%, assuming no politics. Some groups actually go under 20% here, the left leaning men and the right leaning women. So being atypical for your gender can increase one's chances. It's good that the incentive is towards reducing the divide!

The groups on the edges do much worse. Over 50% singles, wow... I'm looking at it and like: are the assumptions reasonable here? Could it really be that people in these groups will spend on average half their adult life alone? It's a sad outlook. What will be the consequence of this, if true? More inter-national marriages is an obvious one: if one can't find a partner with matching values within the "Swedish" population, then can try to find a foreigner. There are plenty of women with more conservative views in Eastern and Central Europe, but where are the foreign men that are as left leaning as Swedish women?

When I started this analysis, I expected dire results for the SD men, not being able to find a partner. My take away from this is that there must also be a lot of frustration from V women who cannot find a partner with their values, either.

Alternative Parameter Values

What happens to matchings if we change our assumptions about equilibrium singles without the political dimension? Assuming a future moral / economic situation like Japan with 50% equilibrium singles, we get:

V, women   80.8%
MP, women  72.0%
S, women   58.6%
C, women   59.6%
L, women   59.6%
M, women   47.7%
KD, women  49.8%
SD, women  53.8%

V, men     59.4%
MP, men    51.6%
S, men     48.0%
C, men     58.1%
L, men     58.1%
M, men     56.2%
KD, men    63.9%
SD, men    76.4%

This scenario has a total of 62.4% equilibrium singles.
What about a much more optimistic moral and economic scenario? (morally favourable, in the sense that it incentivises marriages). Let's consider a world like 1950's American suburbia, with only 2% equilibrium singles:

V, women   24.7%
MP, women  8.1%
S, women   2.8%
C, women   2.3%
L, women   2.3%
M, women   0.5%
KD, women  0.5%
SD, women  0.6%

V, men     0.6%
MP, men    0.5%
S, men     0.5%
C, men     2.6%
L, men     2.6%
M, men     3.3%
KD, men    5.8%
SD, men    19.4%

This scenario has 7.6% equilibrium singles. We see that even in this scenario that is very optimistic with regards to matching, people at the edges still spend a lot of time alone!

References

[1] Couenne, a free open source MINLP solver https://github.com/coin-or/Couenne
[2] I use Couenne by wrapping Pyomo http://www.pyomo.org/

Political intolerance and dating

In February, the following survey came out. It shows the political party sympathies among Swedish 18-29 year olds, broken down my males and females.

Political polls Sweden February 2020. Females 18-29 (left), and males 18-29 (right)

Key to the parties (links are to the corresponding party groups in the EU parliament)

V: Socialist (GUE/NGL)

S: Social democrat (S&D)

MP: Environmental/ green (Greens-EFA)

C: Decentralist liberals (RE/ALDE)

L: Liberals (RE/ALDE)

M: Liberal conservative (EPP)

KD: Christian democrat (EPP)

SD: Conservative reformist (ECR)

Övr: all other parties (not currently in parliament)

When I saw this I thought: "Wow! That is not going to work out. Given some reasonable assumptions about which political parties do not tolerate each other, it won't be possible to match together boys and girls that accept each other's political views". Let's turn this into an optimization model.

Political intolerance model

Between which parties are the values difference so big that it will cause a strain on a romantic relationship? Without motivation, here is the map of tolerance that I use, based on just experience from living in Sweden.

Matching model

In order to get an upper bound to how many people can be matched, we assume that we create a new "Authority of Politically Stable Marriages" that matches people in a top-down fashion. This is preposterous of course, but the point is to see if it will be possible to match everyone even under these very favourable circumstances (in terms of number of matches). We turn this into a network flow model.

Network flow model of political matchmaking. Each party has a male and a female node. The nodes to the terminals (red) have a capacity proportional to the poll results of that party. The grey nodes have infinite capacity. Note that the two sides are not fully connected to each other, but only those nodes that tolerate each other according to the matrix above.

Now we only need to solve a network flow problem. There are specialized algorithms for that, but we can also use a Linear Programming model. The flow is restricted to go from female to male. The only constraint is that inflow should equal outflow in all nodes except in the terminals. The optimization objective is the total outflow (which should be equal to the total inflow).

This may not be the most efficient way for large graphs, but this is very explicit, modifiable, and only takes a minute to code.

Results

So, will it be possible to match young men and women romantically without political strife? Answer: almost!

Assignment of couples that maximize the number of matches. The total match rate is almost 98%. The width of the lines in the middle is proportional to the number of couples of that kind.

Of course, this begs the question: who gets left out? In the table below, the ratio of each group that does not get matched:

V, women:   8.8%
V, men:     0.0%
MP, women:  0.0%
MP, men:    0.0%
S, women:   0.0%
S, men:     0.0%
C, women:   0.0%
C, men:     28.0%
L, women:   0.0%
L, men:     0.0%
M, women:   0.0%
M, men:     0.0%
KD, women:  0.0%
KD, men:    0.0%
SD, women:  0.0%
SD, men:    0.6%

What!? I was certain that the extremes of the political spectrum would have the most trouble, even in this simplified model. Here, the shock seems to be absorbed by the men who vote center liberal. Does that mean that there is a bug? Not necessarily, since we haven't encoded any nuanced preferences: only binary tolerance. The solution can be altered to be less "surprising" to our preconceived ideas without reducing the objective. For example, the M women are now all matched to SD men, and a portion of these could be transferred to the C men, so that these are completely matched. Our model currently does not take into account whether that is on average a preferable transfer or not.

Except for that odd result, the only groups that have unmatched members are indeed women to the far left and men to the far right, as would be expected.

Model uncertainties

We have not looked at the "Other" category. Instead, the opinions of each gender were normalized to sum to 100 percent. Implicitly, this assumes that the males who say "Other party" are distributed along the political spectrum the same way as the other males (and vice versa for females). This assumption is probably quite wrong, since those who answer "Other" are more likely to have extreme opinions, in either direction. Since 4.9% of males and 2.1% of females say "Other", this is a large source of uncertainty in the model.

Another point of uncertainty is that not all who answered were heterosexuals, but we're using the polls to estimate the feasibility of heterosexual matching. The implicit assumption made is that political opinion does not correlate with sexuality. Once again, probably not true.

Of course, an final important implicit assumption is that the number of women and men is the same! This may be imbalanced by either gender being more likely to emigrate.

The Big Problem

The big problem with this model is that relationships are not created by a central authority. It is more realistic to model relationships with a bottom-up model, that does not have a global objective, but rather just local equilibrium constraints. That is to say: a person wants to stay in a relationship if there is no better alternative. A relationship is stable if neither partner has a better alternative. So should solve for that equilibrium instead. This will be the topic of the next post.

onsdag 29 januari 2020

Minesweeper pt. 3: more gameplay

See part 1 for a description of Tile Domination and Gradient.

Adding the feature from part 2, that guarantees that the instances are solvable makes the game much more satisfying. With this knowledge, I could spend all the time thinking of clever inferences, instead of wondering whether the instance is solvable at all. This post contains some interesting cases that I ran into.

Tile Domination

Typical tile domination. The 2 dominates the upper 1 (sees all tiles of it, plus some more)

Tile domination with very good payoff! The yellow tiles contain exactly one mine.

A nice interdependence between the top three middle tiles, give us that the red tile is a mine. Could also be solved with tile domination between the "1" and one of the "2"s.

Gradient

A hard to spot application of the gradient rule. The yellow tiles must contain exactly one mine.

Another hard to spot gradient, between the middle "1" and the middle "3". The yellow tiles contain exactly one mine.

Global constraint

(counting total number of mines)

Typical application of the global constraint. The colored area is the only unknown area left in the game. The yellow, orange, and turquoise tiles contain exactly one mine each. Since there are only three mines left in the game, the green tiles must be free.

A larger application of the global constraint. The yellow and orange tiles must contain exactly two mines each. The turquoise tiles contain exactly one mine. Therefore, the green tiles are free.

Applying the global constraint with two remaining separate areas. There is just one way to assign two mines so as to "touch" all unsatisfied tiles.

Multi-step inferences

Shared tile domination: the "2" on the lower left must be satisfied by the combined yellow and orange tiles.

Proving that the "box" is filled diagonally. I have marked colored boxes around the tiles that yield the conclusions about the colored tiles. The order of reasoning goes first: yellow, orange, purple, and turquoise must all contain one mine, by straight application of adjacency constraints. Then brown is inferred to contain one mine based on orange, purple, and turquoise. With this, we see that the green tiles are free.

Yellow, turquoise, and brown fields contain one mine by adjacency constraints. From yellow, we get the orange by tile domination. Using the same reasoning about the box as above, the purple must contain one mine, so the green tiles are free by saturation of the right "1". Using the box in the other direction, the tile directly above the left "3" must be free.

codecogs equations

måndag 9 mars 2020

COVID19: How many unconfirmed cases are there?

Method

References

COVID19: How delayed are confirmed cases?

COVID19: what is the growth rate?

Method

tisdag 3 mars 2020

Political intolerance and dating pt. 2: Chemistry

The Idea

The Model

Results

Discussion

Alternative Parameter Values

References

Political intolerance and dating

Political intolerance model

Matching model

Results

Model uncertainties

The Big Problem

onsdag 29 januari 2020

Minesweeper pt. 3: more gameplay

Tile Domination

Gradient

Global constraint

Multi-step inferences