trendline gaussian

  • Re: trendline gaussian


    ok think i understand the gamma distribution now, but still I'm juz curious about how the mean, sd, and scaling is obtained for gaussian-fit.xls? and why is scaling required? Must we do scaling for all sorts of distributions?


    i don't really need to use gaussian distribution here but just wan to clarify my doubts. anyone?


    then again, A., can pls enlighten me on how u choose the values of alpha, beta, scaling and x-shift?


    your help is deeply appreciated.

  • Re: trendline gaussian


    Benj,


    In the case of the Oil production we wanted a curve that was 'bell shaped'. Have a look at teh attached sheet that shows four different Gamma Dist. All are possible and all could apply, but we wanted something like the first or second.


    Now look at the values on the axes. The x-axis is generraly below 30 and the y-axis are all below 1, infact closer inspection below 0.5 (although this can change).


    So the Oil data had x-values between 1850 and 2050, these need to be transformed to 0 to 30.


    The Production numbers were in millions so I scaled them to give me a smaller number in the range for the Gamma Distribution of the form that I wanted.


    These steps are known as transforming your data. (I suspect that you will not have to do this).


    Now I sit and play with the a and b values till I get roughly the right shape and then let solver get the best fit.


    However, if you have read the whole thread you will see that I stated early on that this was not teh best approach. Assuming that you are studying statistics at either 'A' level (High School/Junior College?) or degree level then you need to make sure that you are answering the correct question and using the correct methods.


    If you are really stuck go and ask your teacher lecturer to explain it again to you. Take with you what you have done, including the print outs from the site I gave you and the workbook attached and your text book (have you got the recommended text book?). Remember, the chances are that the teacher/lecturer loves this subject (even if you don't). Asking someone about what they love is a gift. They will almost certainly be willing to spend the time with you.


    You could spend hours asking us (and we will continue to try and help) but you might only spend 30 minutes with a teacher and have it all sorted out!


    If the teacher will not help then please post the actual question (in full) and data and I will see what I can do to help you solve it.


    Good luck,


    A.

  • Re: trendline gaussian


    hi, thanks for ur v informative reply...


    i'm just confirming whatever I have understood from your post to see if i'm right.


    for the values of alpha and beta are chosen such that it best suits the required curve, which in the case of the Oil.zip is 'bell-shaped'.


    just for curiousity sake, how did u scale down ur values for x and y axis?


    well, not that i'm lazy to go and consult my teacher but this project is offered by an external agency to the school. my teacher is only checking on me, and it isn't his field of expertise.. as such, i nd to source ard on the net for advice. hope u understand

  • Re: trendline gaussian


    Benj,


    Please post the data that you have so that I can advise you.


    The scaling was undertaken by multiplying the data by a value. You asked questions on it in an earlier post.


    Please post your data so that I can help you. If you do not post the data I will find it very difficult to guide you in teh right direction.


    Thanks,


    Alan.

  • Re: trendline gaussian


    hi


    attached is the input data that I need to use. I'm supposed to use the Operational Risk Loss Distribution Approach (not sure if u r familiar with it) and using severity distribution which includes distributions like gamma, weibull etc and fit the data into a statistical distribution.


    the x-axis will be the range of values of the input data while that of the y-axis will be the probability density.

  • Re: trendline gaussian


    Benj,


    I have had a quick look at your data.


    I am not familiar with the specifics of the "Operational Risk Loss Distribution Approach" but a quick Google has shown me that the data you have is probably the output from some form of stochastic risk model. You are being asked to fit a curve that shows the distribution of risk and the probability of a given cost being exceeded.


    By adding columns to the data that count the number of records less than each value then copying these counts and the raw data to new columns and sorting on the counts you can get an estimate of the cumulative curve. Plot the values on the x-axis and the counts on the y-axis and you will see a curve that is not very pretty. Then log the x-axis and you will see a nice smooth s-curve. This immediately suggests that we are looking to transform our data using a logarithmic function. Note however that since the data includes some negative values we will need to adjust the function to deal with these.


    Can you get this far?


    A.

  • Re: trendline gaussian


    yes, i understand your post.... just wondering, is there any excel functions or macros that allow me to

    Quote

    adding columns to the data that count the number of records less than each value then copying these counts and the raw data to new columns and sorting on the counts

    ??


    and also how can i adjust the log function that you mentioned?

  • Re: trendline gaussian


    Benj,


    I am just about to leave the office.


    To implement the things in my previous post.
    1. In the column adjacent to your data use the formula CountIf to find the number of records that are less than each data point.
    2. In the next column put a formula that calculates the % (a number between 0 and 1) of the total number of records that are below that data value.


    3. Copy all three columns and paste special values
    4. Now sort all three new columns on the %
    5. Plot a chart of the data using the sorted raw data as the x and the % as the y data.
    6. Change the x axis scale to be a log scale.


    Now we have a graph of the data.


    If you are not familiar with doing this then please say so.


    We now need to fit a curve to your data.


    You have previously stated that you need to fit specific curves. Is this the case or are you trying to find the curve with the best fit?


    I am out of the office for a few days.


    You need to read your stats book on fitting continuous distributions. If you do not have this in your book then find one from the library that does. It is first year degree statistics.


    Good luck,


    Alan.

  • Re: trendline gaussian


    hi Alan,


    sorry, but I looked thru the Countif function, not so sure how to set the criteria part of the formula.. as in you were sayingto find the number of records less than the data point, how to determine the data point? what i thot was like COUNTIF(A1:A1000, "<1000"). so does it mean that the criteria is determined upon how i want my x-axis to be?


    Quote

    3. Copy all three columns and paste special values


    what special values do u mean? quite puzzled by this.


    and yes, I do new to find the curve with the best fit.... from wat i have gathered from books, i need to use the Goodness of Fit Test like Chi-square test etc.


    i'm sorry, tink i forgot to mention this at the start of my posts, but actually i need to implement all these in VBA code. but i suppose i can also record a macro to cover all these at the click of a button.

  • Re: trendline gaussian


    i got this off the net, but not so sure if it should be as such..


    i shd use COUNTIF(A1:A3772,A1:A3772) for all the rows of data as it will give an accurate count of how many times a values appear in the whole list of data.


    however, if i use tis, then i wont be able to determine the data point which u mentioned, which i interpretted as the values for the x-axis

  • Re: trendline gaussian


    Benj,


    Are you familiar with using $ signs in formulae?


    The formula that you will need will be something like
    =CountIf(A$1:A$3772,"<="&A1)


    Do a help search for "Paste Special", it is on the Edit menu.


    A.

  • Re: trendline gaussian


    hi, while you were away, i tried to figure things out.. but ur previous post was a great help.. and i managed to come up with the attached file. but my excel file is too big so i only managed to post a screen shot of my graph. btw, my graph is a XY (Scatter)


    wondering how can i generate the curves like you did in oil.zip, ie to have the curve depicting the input values and a few other curves showing gamma, sin curve etc? how do i generate the gamma and sin curve?

  • Re: trendline gaussian


    Benj,


    Now we are getting somewhere. What we have is the curve for the CDF (cumulative distribution function), but what we want is the pdf (Probability Distribution Function). Fortunately Excel provides both functions.


    What we can now do is one of two things.


    1. Discretise the data into bins such that we build a histogram of the pdf and try to fit that curve. That was the curve fitted in Oil.zip


    2. Avoid introducing errors in our analysis and fit acdf and get the parameters and then use that to generate the pdf.


    What we need to do is have a look at the cdf for a variety of functions and plot them for a variety of input parameters (known as a family of curves). You did this (?) for the GammaDist function, although we plotted the pdf, not the cdf.


    Try to create a workbook that has a graph of both the pdf and cdf for a theoretical dataset using the Weibull, NormalDist, GammaDist, LogNormDist functions. Then we can look at which one we actually need to use.


    Post the workbook back as a zip file when you have done this.


    A.

  • Re: trendline gaussian


    pls correct me if i'm wrong, i'm supposed to generate the pdf and cdf using the same set of data values but i will have to set the 4th input of each function to True for CDF and False for Pdf.


    From what I see in oil.zip, i need to divide the input values by a scaling factor. Is there any way to determine this factor? what if i need to write this into vba code? will it be able to determine the factor each time the program is run or will the factor be fixed?


    I'm just wondering, for values of alpha and beta, will I be able to dynamically determine their values each time i run my program that is written in vba? Or can these values be fixed for all cases of inputs?

  • Re: trendline gaussian


    Benj,


    Slow down. I think that you need to see the different shapes of teh functions that you are considering. Then you will undersstand when I suggest doing a particular transformation why I propose it.


    Have a look at the attached workbook, it shows the pdf and cdf for a family of curves for the Gamma and Normal distributions/functions.


    Now we need to work out what shape we think your data curve is....


    Once we have this tool then you can start playing with the parameters to get a best guess. Then we can put these best guesses into a model like Oil.Zip.


    HTH,


    Alan.

  • Re: trendline gaussian


    well i'm really grateful for all your help so far... i really learnt a lot here..


    took a look at the workbook. i need to do a cdf of my input values. By a glance, the Gamma distribution of a=15 b=1 fits my curve, but will need to affirm it again.


    say i want to include normal, gamma and the beta lines into my curve,
    so am i right to say that i merely hv to include into my chart the following 4 lines
    1) line of my input values
    2) Gamma curve which seems to best fit (1)
    3) Beta curve which seems to best fit (1)
    4) Normal curve which seems to best fit (1)


    and from there i do a visual comparison to see which one fits best to (1)?

Participate now!

Don’t have an account yet? Register yourself now and be a part of our community!