Hey, itâ€™s that guy that I am! Glad that I could offer a concrete suggestion, but I wouldnâ€™t go so far as to call me a professional, not until I have a job doing these things at least.
Power analysis is one of those things everyoneâ€™s supposed to do but you can never quite be sure whoâ€™s doing it the right way unless youâ€™re a grand master at the stuff, which Iâ€™m certainly not. Even my grad course in biomedical statistics didnâ€™t do too well talking about it, and there arenâ€™t many online resources that do a good job talking about both the theory behind power analysis and the tools that do it. That said, I think itâ€™s always easier to do something wrong and get feedback on how to correct it than to have nothing at all and ask how to do something. So letâ€™s see what we can get.
A basic power analysis involves four quantities:
 Effect size, aka d
 Sample size, aka n
 Significance Level: P(Type1 error), aka Î±

Power: P( Type2 error), aka 1  Î˛
Two of these youâ€™ve essentially mentioned already. I say four quantities are â€śinvolvedâ€ť because this is actually a basic constraint satisfaction problem: you enter three of the given values into your power analysis calculator and it gives you the value of the fourth variable needed to meet your needs. For example, if you know the significance level, statistical power, and effect size youâ€™re looking for, power analysis will tell you how big your sample needs to be.
Research scientists may have fancy tools at their disposal, but fortunately, the tools for doing statistics are freely available. The statistics and plotting software du jour for many labs is [R][1], which can be called and used from a command line in much the same way as Python. People who prefer a GUI to go along with it tend to use [RStudio][2]. Thereâ€™s even a working R kernel for Jupyter notebooks, [IRKernel][3], which has a fairly straightforward installation process. This format should be familiar to you, since you already use IPython notebooks, and you can even display the output of both languages in the same notebook, as long as you donâ€™t delete the output and you switch to the correct kernel before running commands. For all the stuff below, Iâ€™m just going to be copying and pasting from my terminal.
The â€śpwrâ€ť package is a good basic starting point for doing power analysis and performs many of the functions mentioned in [Cohenâ€™s original 1988 paper][4] where he introduced the concept of power analysis for behavioral science. Once you have R up and running, you can install the package with one line and use another line to import the package into your current workspace.
install.packages("pwr")
library(pwr)
The last thing you need to decide is what kind of statistical test is most appropriate for your design. From a read of your post and the hypothesis youâ€™ve put above, youâ€™re essentially looking to see whether the mean amplitude at the entrained frequency  weâ€™ll get to the question another time whether or not entrainment at the beat frequency is a reasonable expectation  differs between the control group and the binaural beat trial condition. Note that, of course, it isnâ€™t the place of these statistical tests to tell you whether or not your control condition was correct  thatâ€™s a whole different area of experimental design. For these parameters, a paired twotailed t test seems to work just fine. The point of a t test is to look for a difference in means of two sample populations. If you want to do something else, you might want to look into ANOVA, pearson correlation, or one of the other tests available with the pwr package. We use twotailed because, presumably, youâ€™ll consider it to be an effect if the binaural beats increase or decrease amplitude at that frequency in the EEG readout, and indeed the direction of the effect, if it exists, might differ by frequency band. We also use the paired condition because your setup is a type of â€śrepeated measureâ€ť on subjects in and out of the experimental condition of interest. In other words, because each subject serves as his or her own control, the â€śnoiseâ€ť factors that influence the trial condition are probably also true for the control condition. In general, this increases the statistical power of your design vs. comparing two independent populations whose noise factors are more difficult to correct for.
With that, weâ€™re ready to start crunching some numbers for your experiment. So why donâ€™t we start off with some backofthenapkin work based on the information in your post. You have two subjects in your sample, you want a significance level of .05 and Î˛ of 0.2, meaning the desired power is 0.8. Letâ€™s see what the effect size would need to be for us to get useful results from that.
pwr.t.test(n = 2, d = NULL, sig.level = .05, power = .8, alternative = "two.sided", type = "paired")
Error in uniroot(function(d) eval(p.body)  power, c(1e07, 10)) :
f() values at end points not of opposite sign
Well, crude. That error usually means that the expected value pwr.t.test had to return was so outside the bounds of what pwr.t.test expects that it crashed. Letâ€™s try a slightly lower power and see if that works better.pwr.t.test(n = 2, d = NULL, sig.level = .05, power = .7, alternative = â€śtwo.sidedâ€ť, type = â€śpairedâ€ť)
Paired t test power calculation
n = 2
d = 9.340765
sig.level = 0.05
power = 0.7
alternative = two.sided
NOTE: n is number of *pairs*
Much better, now we have some real output. Youâ€™ll notice the function returned the values I entered in the first place, but the value you want to pay attention to is effect size, d = 9.340765. In plain english, that means your experiment is underpowered unless the effect you expect to see is the mean of your trial condition is at least 9.34 times the mean of the control. I canâ€™t really say for sure off the top of my head, but that sounds physiologically impossible.
Donâ€™t worry about the NOTE at the end of the output, thatâ€™s just a reminder on how to interpret the n here. Itâ€™s saying we need 2 pairs of subjects, not just 2 subjects. In this case, each subject is his or her own partner, control vs. trial. I wonâ€™t paste that part in future copypastes.
But we already knew the study was underpowered. This function just gives us a number to put to how much of an effect weâ€™d need to see for this to work. Letâ€™s crunch some numbers that are a little more interesting. The biggest difference in PSD/Hz in your post seems to be around 10 Hz, where the trial condition gets to about 19 and the control looks to be 17. If we assume this to be a true effect, the effect size is about 1917 = 2, assuming standard deviation is near 1. Given two subjects, how much power does the experiment have?
pwr.t.test(n = 2, d = 2, sig.level = .05, power = NULL, alternative = "two.sided", type = "paired")
Paired t test power calculation
n = 2
d = 2
sig.level = 0.05
power = 0.1757074
alternative = two.sided
Well, that doesnâ€™t bode well. The output essentially means, if we have an effect size of 2, thereâ€™s only about a 17% chance weâ€™d detect it with this set up. All this really does is confirm your original description in the post that no significant effects were found, but it means that result could just be due to underpowering.
Without reviewing the literature on binaural beats and other calming/relaxing stimuli, itâ€™s hard to say just what effect size we should expect. Cohen, the guy who made power analyses, stated effect sizes of 0.2, 0.5, and 0.8 correspond to low, medium, and high effect sizes, respectively, which would put the possible effect size as very high, but Cohen was speaking in terms of social sciences, which may not be appropriate here. Just out of curiosity, we can ask how many subjects would we need to run to be certain (sig level = .05, power = 0.8) that and effect size of 2 this is real?
pwr.t.test(n = NULL, d = 2, sig.level = .05, power = .8, alternative = "two.sided", type = "paired")
Paired t test power calculation
n = 4.220731
d = 2
sig.level = 0.05
power = 0.8
alternative = two.sided
Fancy that. Assuming seeing a simple difference in mean value at the peak is what youâ€™re after, you should be able to detect an effect pretty easily, if it exists, with just five subjects. Honestly, this is much more optimistic than I expected, and though Iâ€™ve seen consumer grade tests with as few as 7 subjects, I feel like this might be due to me using inappropriate effect sizes or the wrong statistical test. Thereâ€™s a lot to be said about interesting ways to analyze time series and electrophysiological data like EEG, so there is almost certainly a better way to formulate a hypothesis about the effects of binaural beats that fits with a much more appropriate statistical test. Some of them probably rely less on sample size and would be easier to carry out  one of the weaknesses of null hypothesis statistical tests is their reliance on sample size.
If youâ€™re interested in doing more with the R pwr package, I suggest you start with the very brief materials from the [Open Science Framework][5] and read this [overview of power analysis in R.][6] Each uses a slightly different definition of effect size, so itâ€™ll help you answer the question of which is most appropriate for you, and keep in mind that itâ€™s inextricably linked to how your hypothesis is formulated. Heck, OSF offers free statistical and methodological consulting for experiments and projects. Maybe you can buzz them and see if theyâ€™d help you out if you have specific questions.
As a cherry on top, Iâ€™ve gone ahead and made this little graph. It shows statistical power on the xaxis, and each line represents a different effect size. Itâ€™ll give you an idea of how many samples you need for a given effect size and desired power. The long and short of it is that, if the effect is > 1, you probably donâ€™t need more than 10 subjects to get a wellpowered study. Any less than that, and you probably need 13 dozen depending on the actual effect size. Itâ€™s not much, but at least itâ€™ll help you plan a bit. You can find the code for making the plot here: https://gist.github.com/dafff23cc44215b25c15
Let me know if thereâ€™s any other way I can help. Grant
[1]: http://www.rproject.org/
[2]: https://www.rstudio.com/
[3]: https://github.com/IRkernel/IRkernel
[4]: http://www.lrdc.pitt.edu/schneider/p2465/Readings/Cohen,%201988%20(Statistical%20Power,%20273406).pdf
[5]: https://osf.io/d9284/
[6]: http://www.statmethods.net/stats/power.html